Time Series Analysis with Python and Statsmodels

Introduction
Prerequisites
Installation
Importing Required Libraries
Loading and Preparing Time Series Data
Exploratory Data Analysis
Stationarity
Decomposition
Autocorrelation and Partial Autocorrelation
Modeling
Making Predictions
Model Evaluation
Conclusion

Introduction

In this tutorial, we will explore time series analysis using Python and the Statsmodels library. Time series analysis is a statistical technique for analyzing and forecasting data points collected over time. It is widely used in various domains, including finance, economics, weather forecasting, and more.

By the end of the tutorial, you will learn how to:

Load and prepare time series data
Perform exploratory data analysis
Test for stationarity
Decompose time series into trend, seasonal, and residual components
Analyze autocorrelation and partial autocorrelation
Build time series models using AR, MA, and ARIMA models
Make predictions using the fitted models
Evaluate the performance of the models

Let’s get started!

Prerequisites

To follow along with this tutorial, you should have a basic understanding of Python programming language and fundamental concepts of statistics. Familiarity with pandas library will also be helpful for data manipulation tasks.

Installation

Before we begin, let’s ensure that we have the necessary libraries installed. Open your terminal and run the following command to install the required libraries: python pip install pandas statsmodels matplotlib The above command will install pandas, statsmodels, and matplotlib libraries, which are essential for time series analysis. If you are using Jupyter Notebook, you can run the command directly in a code cell.

Importing Required Libraries

Once we have installed the necessary libraries, we can import them into our Python script or Jupyter Notebook. Open your favorite text editor or Jupyter Notebook, and let’s start by importing the required libraries: python import pandas as pd import statsmodels.api as sm import matplotlib.pyplot as plt We have imported pandas as pd, statsmodels.api as sm, and matplotlib.pyplot as plt. These libraries will be used for data manipulation, statistical modeling, and visualization, respectively.

Loading and Preparing Time Series Data

To perform time series analysis, we need a dataset that contains time-stamped observations. In this tutorial, we will use a sample dataset available in the Statsmodels library called “macrodata.” The macrodata dataset contains various macroeconomic variables collected over time.

We can load the macrodata dataset using the following code: python macro_data = sm.datasets.macrodata.load_pandas().data The macrodata dataset will be loaded into a Pandas DataFrame called “macro_data.”

Next, we need to prepare our time series data before proceeding with the analysis. Typically, a time series dataset includes a date or timestamp column and a corresponding value column. In our case, the timestamp column is contained in the “year” and “quarter” columns of the macro_data DataFrame.

To ensure that our date is recognized as a time series object, we will convert the “year” and “quarter” columns into a proper time series index. Here’s how you can do it: python macro_data['date'] = pd.date_range(start='1959Q1', periods=len(macro_data), freq='Q') macro_data.set_index('date', inplace=True) In the code above, we create a new “date” column using the pd.date_range() function. We specify the start date as ‘1959Q1’, the length of the data as the number of rows in the macro_data DataFrame, and the frequency as ‘Q’ (quarterly).

Finally, we set the “date” column as the index of the macro_data DataFrame using the set_index() method.

Exploratory Data Analysis

Once we have prepared our time series data, it’s a good practice to perform exploratory data analysis (EDA) to gain insights into the dataset. EDA involves visualizing and summarizing the data to understand its properties and patterns.

Let’s start by plotting the time series data using Matplotlib: python plt.figure(figsize=(10, 6)) plt.plot(macro_data.index, macro_data['infl'], label='Inflation') plt.xlabel('Year') plt.ylabel('Inflation') plt.title('Inflation Over Time') plt.legend() plt.show() In the code above, we create a figure with a size of 10 (width) by 6 (height) using plt.figure(). Then, we plot the inflation data by accessing the ‘infl’ column of the macro_data DataFrame and pass it to plt.plot(). We add labels and a title to the plot using plt.xlabel(), plt.ylabel(), and plt.title(). Finally, we display the plot using plt.show().

Running the code above will generate a line plot showing the inflation over time.

Stationarity

Stationarity is an essential concept in time series analysis. A stationary time series is one whose statistical properties, such as mean and variance, remain constant over time. Stationarity allows us to model the time series data more accurately.

We can check for stationarity in our data using the Augmented Dickey-Fuller (ADF) test, which tests the null hypothesis of non-stationarity. Here’s how you can perform the ADF test using Statsmodels: python adf_result = sm.tsa.stattools.adfuller(macro_data['infl']) print('ADF statistic:', adf_result[0]) print('p-value:', adf_result[1]) print('Critical values:', adf_result[4]) In the code above, we pass the inflation data to the sm.tsa.stattools.adfuller() function, which performs the ADF test. The function returns a tuple containing the ADF statistic, p-value, critical values, and other information.

By printing the ADF statistic, p-value, and critical values, we can assess whether our data is stationary or not. If the p-value is less than a significance level (e.g., 0.05), we can reject the null hypothesis of non-stationarity and conclude that our data is stationary.

Decomposition

Time series data can often exhibit a combination of various patterns, including trends, seasonality, and residual noise. Decomposition helps us separate these individual components to better understand the underlying patterns.

We can decompose our time series data using the sm.tsa.seasonal_decompose() function. Here’s an example of how to decompose the inflation data into trend, seasonal, and residual components: python decomposition = sm.tsa.seasonal_decompose(macro_data['infl'], model='additive') In the code above, we pass the inflation data to the sm.tsa.seasonal_decompose() function, specifying the model as ‘additive’. The function returns a DecomposeResult object containing the trend, seasonal, and residual components.

Once decomposed, we can visualize the individual components using the following code: ```python plt.figure(figsize=(10, 8))

plt.subplot(4, 1, 1)
plt.plot(macro_data['infl'], label='Original')
plt.ylabel('Inflation')
plt.legend()

plt.subplot(4, 1, 2)
plt.plot(decomposition.trend, label='Trend')
plt.ylabel('Trend')
plt.legend()

plt.subplot(4, 1, 3)
plt.plot(decomposition.seasonal, label='Seasonal')
plt.ylabel('Seasonal')
plt.legend()

plt.subplot(4, 1, 4)
plt.plot(decomposition.resid, label='Residual')
plt.xlabel('Year')
plt.ylabel('Residual')
plt.legend()

plt.tight_layout()
plt.show()

Published: 29 June 2022