添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

When the data is indexed in a form where the data points are the magnitude of changes occurring with time, that data can be considered as the time-series data. For example, a unit of sales of any commodity for a particular date, week, month, or year, or change in the temperature with the time. So this is one of the important domains of data science where we forecast the future value according to the history in the time series. In forecasting, we have many models that help us make predictions and forecast the values to fulfil our future aspects according to the situation’s demand. The examples of models can be  AR, MA, ARIMA, SARIMA, VAR, SARIMAX etc.

We assume that you have a basic understanding of the time series analysis and basic knowledge about the forecasting models. You can go through the below articles for more details on these topics.

Referring to these articles, you can better understand the time series analysis and understand how the different ARIMA family models work with different time series data.

In this article, first of all, we will read the data and perform the preprocessing steps. After that, we will discuss the ARIMA and SARIMAX models with their implementation. It will be demonstrated that when the seasonality and exogenous factors are available in the time series, how SARIMAX can be a perfect model in this case.

Code Implementation: Reading and Preprocessing the Data

To implement the models, I am using the data to have a monthly sales value of alcohol for 1964 to 1972.

Here I am using Google Colab to run the codes. We are required to mount our drive to the notebook using the following command.

Input:

from google.colab import drive
drive.mount('/content/drive')

Importing the libraries.

Input:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Importing the data set.

Input:

data = pd.read_csv("/content/drive/MyDrive/Yugesh/SARIMAX/data.csv")
data.head()

Output:

Here, by the visualization only, we can see the availability of the seasonality in the data set. In the graph, we can see that the magnitude of the sales is changed repeatedly, showing the changes almost similar for different time intervals. More formally, we can see that for the starting months of any year we are getting a sudden drop in the sales for the starting mon the last year.

Four kinds of components help make a time series, and also they can affect our time series analysis if present in excess. So here, for this time series, we need to check more for the availability of components.

Decomposing the time series.

Input:

from statsmodels.tsa.seasonal import seasonal_decompose
decompose_data = seasonal_decompose(data, model="additive")
decompose_data.plot();

Output:

from statsmodels.tsa.stattools import adfuller
dftest = adfuller(data.Sales, autolag = 'AIC')
print("1. ADF : ",dftest[0])
print("2. P-Value : ", dftest[1])
print("3. Num Of Lags : ", dftest[2])
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest[3])
print("5. Critical Values :")
for key, val in dftest[4].items():
    print("\t",key, ": ", val)

Output:

Here we can see that the p-value is higher for our dataset, and we can say that the evidence of the null hypothesis is low; hence the time series is non-stationary. We can make the time series stationary with differencing methods. In this case, we are going ahead with the rolling mean differencing methods. For more methods of differencing, you can refer to this article . The suggested article is mainly focused on deseasonalizing and differencing where also you can get acquaintances with the adfuller test and other methods of differencing.

Often with the data where the effect of seasonality is in excess, we use the rolling mean differencing.

Appling the rolling mean differencing

Input:

rolling_mean = data.rolling(window = 12).mean()
data['rolling_mean_diff'] = rolling_mean - rolling_mean.shift()
ax1 = plt.subplot()
data['rolling_mean_diff'].plot(title='after rolling mean & differencing');
ax2 = plt.subplot()
data.plot(title='original');

Output:

dftest = adfuller(data['rolling_mean_diff'].dropna(), autolag = 'AIC')
print("1. ADF : ",dftest[0])
print("2. P-Value : ", dftest[1])
print("3. Num Of Lags : ", dftest[2])
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest[3])
print("5. Critical Values :")
for key, val in dftest[4].items():
  print("\t",key, ": ", val)

Output:

In statistics and in time series analysis, an  ARIMA( autoregressive integrated moving average) model is an update of ARMA (autoregressive moving average). The ARMA consists of mainly two components, the autoregressive and moving average; the ARIMA consists of an integrated moving average of autoregressive time series.  ARIMA model is useful in the cases where the time series is non-stationary. And the differencing is required to make the time series stationary.

Mathematically we can represent the formula like this.

So more formerly if we are saying that ARIMA(1,1,1) which means ARIMA model of order (1, 1, 1) where AR specification is 1, Integration order or shift order is one and Moving average specification is .1

Our basic motive in this time series analysis is to use the ARIMA model to predict the future value and compare it with the SARIMAX model. One of the important parts of time series analysis using python is the statsmodel package. This provides most of the model and statistical tests under one roof, and also earlier in the article, we have used it so many times.

Implementation of the model without differencing.

Importing the model.

Input:

from statsmodels.tsa.arima_model import ARIMA

Defining the model using sales column.

Input:

model=ARIMA(data['Sales'],order=(1,1,1))
history=model.fit()

Checking the summary of the model.

Input:

history.summary()

Output:

Here we can see that our forecast is lying approximately on the given data in all processes we are trying to make predictions on available data and the values are quite satisfying but not the data we used after differencing, which means the values we are going to predict also without the seasonality effect or any other affecting components. So to get rid of the situation, we can use the SARIMAX model. So let’s have an overview of SARIMAX.

SARIMAX

SARIMAX(Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors) is an updated version of the ARIMA model. ARIMA includes an autoregressive integrated moving average, while SARIMAX includes seasonal effects and eXogenous factors with the autoregressive and moving average component in the model. Therefore, we can say SARIMAX is a seasonal equivalent model like SARIMA and Auto ARIMA.

Another seasonal equivalent model holds the seasonal pattern; it can also deal with external effects. This feature of the model differs from other models. For example, in a time series, the temperature has seasonal effects like it is low in winter, high in summers. Still, with the effect of external factors like humidity, the temperature in winter is increased and also due to rain, there is a chance of lower temperature. We can’t predict the exact value for these factors if they do not appear in a cyclic or any seasonal behaviour. Other models are not capable of dealing with this kind of data.

In the SARIMAX models parameter, we need to provide two kinds of orders. The first one is similar to the ARIMAX model (p, d, q), and the other is to specify the effect of the seasonality; we call this order a seasonal order in which we are required to provide four numbers.

(Seasonal AR specification, Seasonal Integration order, Seasonal MA, Seasonal periodicity)

Mathematically we can represent the model like this.

We can see that the model has predicted the values without compromising with the seasonality effects and exogenous factors. And the trend line is almost going as usual as it was going in past years.

This article has seen how the SARIMAX becomes useful when the seasonality and exogenous factors are available in the time series. In the current scenario, many factors affect the trend of the time series, and in this situation, it gets difficult to predict accurately. Therefore, I encourage you to go deeper into the model and determine how it can get accurate in prediction.

References :

AIM discovers new ideas and breakthroughs that create new relationships, new industries, and new ways of thinking. AIM is the crucial source of knowledge and concepts that make sense of a reality that is always changing.

Our discussions shed light on how technology is transforming many facets of our life, from business to society to culture.