How To Determine P D Q In Arima Python

In this tutorial, we’ll learn how to determine the P, D, and Q values in an ARIMA model using Python. ARIMA stands for Autoregressive Integrated Moving Average, which is a forecasting algorithm that helps to predict future values in a time series dataset by analyzing the dataset’s past values and trends.

The ARIMA model takes three main parameters, denoted as P, D, and Q, which are crucial for the performance of the model.

Requirements

To follow this tutorial, you need to install the following Python libraries:

  1. Pandas
  2. matplotlib
  3. numpy
  4. pmdarima

You can install these libraries using pip:

Step 1: Importing necessary libraries and loading the dataset

First, let’s import the required libraries and load the dataset, which will be a simple time series dataset. For this tutorial, we will use the Air Passengers dataset, which is a well-known time series dataset representing the total number of airline passengers per month from 1949 to 1960.

The dataset can be downloaded from Kaggle and it looks like this:

Month	#Passengers
Jan-49	112
Feb-49	118
Mar-49	132
Apr-49	129
May-49	121
Jun-49	135
Jul-49	148
Aug-49	148
Sep-49	136
Oct-49	119
Nov-49	104
Dec-49	118
Jan-50	115
Feb-50	126
Mar-50	141
Apr-50	135
...

Step 2: Plotting the dataset

Before determining the P, D, and Q values, it is a good idea to visualize the dataset to identify any trends or seasonality.

From the plot, we can observe that there is an upward trend and seasonality in the data.

Step 3: Differencing the dataset

In order to determine the value of D, we need to make the dataset stationary by removing any trends or seasonality. We can do this by differencing the data using the .diff() function provided by Pandas.

In the differenced plot, the trend seems to be removed, indicating that D = 1 might be suitable.

Step 4: Determine P and Q using ACF and PACF plots

Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots can be used to identify the optimum values for P and Q. Let’s plot ACF and PACF on the differenced data.

From the ACF plot, we can see that it cuts off after 1 lag, so Q = 1. From the PACF plot, we can see that it also cuts off after 1 lag, so P = 1.

Step 5: Auto ARIMA

We can also determine the P, D, and Q values using an automatic approach provided by the pmdarima library’s auto_arima function.

The output will show the best-fitted ARIMA model, and you can get the values for P, D, and Q from the parameters.

Output (P, D, and Q values):

Conclusion

In this tutorial, we learned how to determine the P, D, and Q values for an ARIMA model using Python. We used ACF and PACF plots, as well as the auto_arima function provided by the pmdarima library to automatically determine the optimal values for our model’s parameters.

Now, you can use these values to build an ARIMA model for your time series data and make accurate forecasts.