statsmodels: SARIMAX model too large

Describe the bug

Team, I’m running SARIMAX for daily data and trying to catch annual seasonality. I used seasonal_order = (0, 1, 0, 365) but I faced the following issues:

  • Model took a few minutes to generate, which is too long (x10 or so) compared to R’s forecast$auto.arima function
  • Model generated was over 7 GB which is also too large compared to R’s forecast$auto.arima

The seasonal_order I’m using is what auto.arima (in R) suggested for the data I’m feeding the model. Data is confidential but I generated a similar dataset (below) to replicate the issue and show my code.

If I’m missing something please let me know. Otherwise, any suggestions on how to make SARIMAX work for this data. I’d like to catch the annual seasonality first, I can account for the weekly later.

I greatly appreciate you looking into this.

Thanks, Sam

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
import statsmodels as sm
from statsmodels.tsa.statespace.sarimax import SARIMAX
import matplotlib.pyplot as plt

print(f"statsmodels version: {sm.version.full_version}")
# >>> statsmodels version: 0.9.0

# Variables and data
figsize = (16, 8)
freq = "D"
n = 800
np.random.seed(7)
df = pd.DataFrame(
    data={
        "ds": pd.date_range("2016-11-01", periods=n, freq=freq),
        "y": np.random.rand(n),
    }
)
df.set_index("ds", drop=False, inplace=True)
df["exo1"] = df.index.month
df["exo2"] = df.index.week

df.loc[:, "y"] = df["y"] + df.index.dayofweek
add_fact = (
    (df.index.month == 12)
    * 1
    * (df.index.week.values - pd.to_datetime("2018-11-25").week)
    * 4
)
add_fact[add_fact < 0] = 0
df["add_fact"] = add_fact
df.loc[:, "y"] += df["y"] * df["add_fact"]

df.loc[:, "y"].plot(figsize=figsize)

# Train / predict
split_date = "2018-11-30"
m = SARIMAX(
    endog=df.loc[:split_date, "y"],
    exog=df.loc[:split_date, ["exo1", "exo2"]],
    order=(1, 1, 0),
    seasonal_order=(0, 1, 0, 365),
)
m_fit = m.fit()
pred = m_fit.forecast(
    df.loc[split_date:, "y"].shape[0], exog=df.loc[split_date:, ["exo1", "exo2"]]
)

# Visualize
fig = plt.figure(figsize=figsize)
ax = fig.add_subplot(111)
df["y"].plot(figsize=figsize, ax=ax)
pred.plot(figsize=figsize, ax=ax)

Expected Output

A clear and concise description of what you expected to happen.

Output of import statsmodels.api as sm; sm.show_versions()

[paste the output of sm.show_versions() here below this line]

statsmodels version: 0.9.0

I got this error running sm.show_version(): AttributeError: module ‘statsmodels’ has no attribute ‘show_versions’

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 5
  • Comments: 17 (8 by maintainers)

Most upvoted comments

Each of the three arguments to fit is required for the long seasonal models, and each implies different restrictions. In general, these are not problematic, but it depends on what you need:

  • The downside to using innovations_mle is primarily that it doesn’t support missing data. There are also differences in the way that it handles integration (the d and D specifications), although these differences are likely to be small unless you have relatively few periods.

  • The downside to using low_memory=True is that you will not have access to output from the Kalman smoother, and you can’t perform dynamic in-sample predictions or compute in-sample confidence intervals. However, you can still perform out-of-sample forecasting.

  • The downside to using cov_type='none' is that you will not get standard errors (or t-stats or p-values) associated with the model parameters.

Thanks for the report. Yes, a period of 365 is really too much for the SARIMAX model - we should at least add a warning.

In your case you’re just doing seasonal differencing, so you can use the argument simple_differencing=True and that will allow you to run the specification.

More generally, if the seasonal AR or MA terms were not zero, the model is basically unsupported (i.e. very slow to estimate) for such a large period (and the same is true of R’s arima function).