Assumed Knowledge

In this article, I assume that you are familiar with Forecasting Time Series: AR(p) Model and Forecasting Time Series: MA(q) Model. My assumed knowledge section in that article is more in depth on the basics for understanding this series of articles.

What is an ARMA(p, q) Model?

ARMA(p, q) stands for Autoregressive Moving Average model of order p, q. This is basically a combination of the AR(p) model from a previous article and an MA(q) model from another previous article.

To define this model mathematically, both the AR(p) model and MA(q) model must first be defined. The AR(p) model, as defined in a previous article is:

y(t) = \alpha + \sum_{i=1}^{p}{\beta_iy(t-i)} + \epsilon_t

The MA(q) model has the following equation:

x(t) = \mu + \sum_{i=1}^{q}{\theta_i\epsilon_{t-i}} + \epsilon_t

Now the ARMA(p, q) model can be defined as the sum of the two, leading to the following derivation:

\begin{gather*} z(t) = \alpha + \sum_{i=1}^{p}{\beta_iz(t-i)} + \mu + \sum_{i=1}^{q}{\theta_i\epsilon_{t-i}} + \epsilon_t \\ \gamma = \alpha + \mu \\ z(t) = \gamma + \sum_{i=1}^{p}{\beta_iz(t-i)} + \sum_{i=1}^{q}{\theta_i\epsilon_{t-i}} + \epsilon_t \end{gather*}

Adding the two models together allows us to define a new constant which is just the sum of the two constants from the previous two models. In a lot of cases, this term would just be zero or very close to zero.

Fitting an ARMA(p, q) Model

First the data must be generated, which I do with the following formula:

\begin{gather*} t:t\in\mathbb{Z},1 \le t \le 1000 \\ z(t) = \beta_1z(t-1) + \theta_1\epsilon_{t-1}+\epsilon_t \end{gather*}

Turning the formula into code, produces:

import numpy as np

N = 1000
np.random.seed(4)
norm = np.random.normal(0, 1, N)
x = np.linspace(0, N)
data = np.zeros(N)
beta_1 = 0.75
theta_1 = -0.5
for i in range(1, N):
    data[i] = beta_1 * data[i-1] + theta_1 * norm[i-1] + norm[i]

Plotting this data produces the following graph:

ARMA(1,1) with beta = 0.75 and theta = -0.5

I then plot the Autocorrelation Function:

In this you can see that there are some significant peaks at lag = 1, 2, 3, 4, then a sharp drop to being insignificant. So now I plot the Partial Autocorrelation Function:

In this plot, you can see a peak at lag = 1, 2.

Using these two plots, we can estimate that the parameters for p and q will be in this set:

\begin{align*} &p:p\in\mathbb{Z},\{1,2\} \\ &q:q\in\mathbb{Z},\{1,2,3,4\} \end{align*}

To find the optimum parameters, I use the Bayesian Information Criterion, which is characterised by this equation:

BIC=k\ln(n)-2\ln(\hat{L})

Where $\hat{L}$ is the maximum likelihood function of the model, n is the number of data in the dataset, and k is the number of parameters that the model estimates. This function allows us to avoid overfitting the model parameters to the data, by running this for each number of parameters, and minimizing the BIC value. I do this using this code:

from statsmodels.tsa.arima.model import ARIMA
train_data = data[:int(N*0.9)]
test_data = data[int(-N*0.1):]

min_bic = np.inf
min_fit = None
for p in range(1, 3):
    for q in range(1, 5):
        model = ARIMA(train_data, order=(p, 0, q))
        model_fit = model.fit()
        if model_fit.bic < min_bic:
            min_bic = model_fit.bic
            min_fit = model_fit

if min_fit != None:
    print(min_fit.summary())

Which gives the following output:

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                  900
Model:                 ARIMA(1, 0, 1)   Log Likelihood               -1238.906
Date:                Sun, 21 Dec 2025   AIC                           2485.812
Time:                        22:41:34   BIC                           2505.021
Sample:                             0   HQIC                          2493.150
                                - 900                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.1285      0.061      2.099      0.036       0.009       0.248
ar.L1          0.7379      0.061     12.151      0.000       0.619       0.857
ma.L1         -0.4991      0.078     -6.390      0.000      -0.652      -0.346
sigma2         0.9186      0.045     20.363      0.000       0.830       1.007
===================================================================================
Ljung-Box (L1) (Q):                   0.10   Jarque-Bera (JB):                 0.65
Prob(Q):                              0.75   Prob(JB):                         0.72
Heteroskedasticity (H):               0.88   Skew:                             0.01
Prob(H) (two-sided):                  0.29   Kurtosis:                         2.87
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

So here you can see that our optimum fit was with p = 1, and q = 1. Which is the exact model we used to generate the data, so it is expected.

To test this model, we predict the future and plot it against the test data, using the following code:

forecast_result = model_fit.get_forecast(steps=int(N*0.1))
forecast_values = forecast_result.predicted_mean
confidence_intervals = forecast_result.conf_int()

# Plotting the results
plt.figure(figsize=(10, 5))
plt.plot(test_data, label="Actual (Test Data)")
plt.plot(forecast_values, label="Forecast", color="red")
plt.fill_between(
    range(len(test_data)),
    confidence_intervals[:, 0],
    confidence_intervals[:, 1],
    color="pink",
    alpha=0.3,
    label="95% Confidence Interval"
)
plt.title("MA(1) Forecast vs Actual")
plt.legend()
plt.show()

Which produces this plot:

So you can see that the majority of the data is within our prediction 95% confidence interval, and just like the other models, the predictions become less accurate as the predictions go further into the future.

Forecasting Time Series: ARMA(p, q) Model

Assumed Knowledge

What is an ARMA(p, q) Model?

Fitting an ARMA(p, q) Model