📄Seasonal Autoregressive Integrated Moving Average (SARIMA)
SARIMA stands for Seasonal Autoregressive Integrated Moving Average. It`s a powerful statistical model used for forecasting time series data with seasonal patterns. In simpler terms, it helps predict future values in a series that shows regular ups and downs over specific periods like months, quarters, or years. SARIMA is an extension of the ARIMA (Autoregressive Integrated Moving Average) model, which doesn`t handle seasonality effectively. It adds additional parameters to account for seasonal behavior in the data. Think of it like capturing the regular ups and downs within your overall trend. SARIMA is popular in various fields like finance (stock price prediction), business (demand forecasting), weather forecasts (temprature), and even epidemiology (tracking disease outbreaks).
SARIMA has following components:-
Seasonal (S): For seasonal patterns.
Autoregressive (AR): Considers past values of the series to predict future values.
Integrated (I): Removes trends by differencing the data (subtracting previous values).
Moving Average (MA): Accounts for random noise by considering past error terms.
To calculate SARIMA in MS Excel using python, you can use the SARIMAX library from statsmodels. Import this library using following code.
from statsmodels.tsa.statespace.sarimax import SARIMAX
We used global temprature data from 1880 to 1999 in our spreadsheet which we placed in column C in range C2:C121. We can import that data into a python dataframe using following python code.
data = xl("C2:C121")
Then we create SARIMAX model by providing three basic parameters: data, order, and trend. Data is the dataframe we created from the excel range. The same data will be entered into the SARIMAX data paramter. Order requires three paramters. Auto-Regressive Order, Difference Order and Trend Order. Auto-Regressive order (p), refers to the number of past values considered, when predicting the future values of a data point in a time series.
Difference order (d) aims to stabilize the data by subtracting a previous lagged value from the current one. The difference order "d" specifies the number of times this differencing is applied.
d = 0: No differencing. This implies the data might already be stationary or requires further investigation.
d = 1: First-order differencing removes linear trends and seasonality if the seasonal period is 1.
d > 1: Higher-order differencing might be necessary for complex trends or longer seasonal periods.
However, over-differencing can remove too much information and harm the model`s accuracy. Finding the optimal difference order is crucial for accurate forecasting. Tools like autocorrelation function (ACF) and partial autocorrelation function (PACF) plots help identify patterns and guide the selection. Statistical tests like Dickey-Fuller test can also confirm stationarity after differencing.
Moving Average order (q) captures the influence of past forecasting errors on the current value of the series. It essentially looks at past errors (residuals) in the data and uses them to adjust the predictions. Higher q values imply more emphasis on past errors.
Finally, the trend argument in statsmodels.tsa.statespace.SARIMAX specifies how to account for potential trends in your time series data. It allows you to model either a constant term, a linear trend over time, or both. Parameter value `c` means constant and `t` means trend (increasing or decreasing) and `ct` means both.
model = SARIMAX(data, order=(10, 1, 0), trend="ct")
Now we can train the model using the model.fit command.
fit= model.fit()
To predict the values, we can use the forecast function with steps parameter defining the number of values to be predicted. Here we are predicting 24 values i.e. from 2000 to 2023.
forecast = fit.forecast(steps=24)
You can predict any number of values as you like. If you predict just one value, and then predict next value from dataset including the previously predicted value, the result will almost be the same.
You can convert these values from array to list using forecast.tolist(). And to display the data, make sure to change python output next to the formula bar from python object, to excel value.
Relevent youtube tutorial: https://virtual-school.org/p?v=aUHcEc1ZBYY
Reference: https://www.statsmodels.org/devel/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html