Time Series Analysis in Python: Using statsmodels

Table of Contents

  1. Introduction
  2. Prerequisites
  3. Installation
  4. Loading and Exploring Time Series Data
  5. Stationarity
  6. Time Series Decomposition
  7. Autocorrelation and Partial Autocorrelation
  8. Building and Evaluating Time Series Models
  9. Conclusion

Introduction

Time Series Analysis is a statistical technique that deals with data points indexed in time order. It helps to understand patterns, trends, and dependencies within the data, making it crucial for various fields such as finance, economics, weather forecasting, and more.

In this tutorial, we will explore time series analysis in Python using the statsmodels library. We will cover topics like loading and exploring time series data, checking for stationarity, time series decomposition, autocorrelation, partial autocorrelation, and building and evaluating time series models.

By the end of this tutorial, you will have a solid foundation in performing time series analysis and be able to apply these techniques to your own data.

Prerequisites

To follow along with this tutorial, you should have a basic understanding of Python programming language concepts and a working installation of Python (version 3.6 or later). Familiarity with pandas and numpy libraries would also be beneficial.

Installation

First, let’s make sure we have the necessary libraries installed. Open your terminal or command prompt and run the following command: python pip install statsmodels Once the installation is complete, we can import the required modules in our Python script or Jupyter Notebook. python import pandas as pd import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.seasonal import seasonal_decompose from statsmodels.graphics.tsaplots import plot_acf, plot_pacf from statsmodels.tsa.arima.model import ARIMA

Loading and Exploring Time Series Data

To perform time series analysis, we need some data to work with. We will use a sample dataset provided by statsmodels called “AirPassengers”. This dataset contains the number of international airline passengers, recorded monthly from 1949 to 1960.

To load the dataset into a pandas DataFrame and explore its structure, use the following code: ```python from statsmodels.datasets import get_rdataset

data = get_rdataset('AirPassengers').data
data.head()
``` This will display the first few rows of the dataset, giving you an overview of the available columns and their values.

Stationarity

Stationarity is a fundamental assumption in time series analysis. A stationary series has constant mean, variance, and autocovariance over time. To check for stationarity, we can perform a visual inspection or conduct statistical tests like the Augmented Dickey-Fuller (ADF) test.

To visually inspect the stationarity of a time series, we can plot the data and look for trends, seasonality, and irregularities. Let’s plot the “AirPassengers” dataset and check if any patterns exist: python plt.plot(data['time'], data['value']) plt.xlabel('Year') plt.ylabel('Number of Passengers') plt.title('International Airline Passengers') plt.show() If the plot shows an increasing or decreasing trend, it indicates non-stationarity. To confirm this, we can conduct the ADF test using the adfuller() function from the statsmodels library: ```python from statsmodels.tsa.stattools import adfuller

result = adfuller(data['value'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
``` If the p-value is less than a chosen significance level (e.g., 0.05), we can reject the null hypothesis and consider the data as stationary.

Time Series Decomposition

Time series decomposition helps us understand the underlying components of a time series: trend, seasonality, and residual (or noise).

To decompose a time series, we can use the seasonal_decompose function from statsmodels. This function applies either an additive or multiplicative decomposition method. Here’s an example of the additive method: python result = seasonal_decompose(data['value'], model='additive') result.plot() plt.show() The plot will display the original time series, trend, seasonality, and residuals. This decomposition allows us to analyze each component separately and identify patterns or anomalies.

Autocorrelation and Partial Autocorrelation

Autocorrelation (ACF) and partial autocorrelation (PACF) are tools that help us understand the relationship between an observation and a lagged version of itself.

To plot the ACF and PACF for our time series, we can use the plot_acf and plot_pacf functions from statsmodels. Here’s an example: ```python plot_acf(data[‘value’], lags=20) plt.xlabel(‘Lags’) plt.ylabel(‘Autocorrelation’) plt.title(‘Autocorrelation Function’) plt.show()

plot_pacf(data['value'], lags=20)
plt.xlabel('Lags')
plt.ylabel('Partial Autocorrelation')
plt.title('Partial Autocorrelation Function')
plt.show()
``` These plots will help us identify the order of autoregressive (AR) and moving average (MA) terms for building time series models.

Building and Evaluating Time Series Models

Now, let’s move on to building time series models based on the identified components and patterns.

One popular model for time series forecasting is the ARIMA model, which stands for AutoRegressive Integrated Moving Average. The ARIMA model combines autoregressive (AR), differencing (I), and moving average (MA) components.

To build an ARIMA model, we can use the ARIMA class from statsmodels. Here’s an example: ```python model = ARIMA(data[‘value’], order=(1, 1, 1)) model_fit = model.fit()

# Get the predicted values
predictions = model_fit.predict(start='1949-01', end='1961-12')

# Plot the observed and predicted values
plt.plot(data['time'], data['value'], label='Observed')
plt.plot(data['time'], predictions, label='Predicted')
plt.xlabel('Year')
plt.ylabel('Number of Passengers')
plt.title('ARIMA Model')
plt.legend()
plt.show()
``` We can evaluate the model's performance using various metrics such as mean squared error (MSE), root mean squared error (RMSE), or mean absolute error (MAE).

Conclusion

In this tutorial, we learned how to perform time series analysis in Python using the statsmodels library. We covered the process of loading and exploring time series data, checking for stationarity, time series decomposition, autocorrelation, partial autocorrelation, and building ARIMA models.

Time series analysis is a powerful technique that can provide valuable insights into temporal data. By understanding the patterns and dependencies within time series data, we can make informed decisions and forecasts.

I hope this tutorial has helped you understand the basics of time series analysis in Python. Happy analyzing!