statsmodels: Bug in ARIMA predict(): ValueError: Must provide freq argument if no data is supplied
I am using a series with a datetimeindex to forecast. I get the error “ValueError: Must provide freq argument if no data is supplied” if I try to predict out of sample for some of my variables. I can’t tell what it is that triggers the difference.
Here’s an abbreviated view of my data, note the datetime index. It does have a freq argument.
date
1982-09-30 11.994166
1982-10-31 11.978513
1982-11-30 11.964179
1982-12-31 11.994691
1983-01-31 12.021939
1983-02-28 11.987369
1983-03-31 12.015082
...
2016-05-31 13.026502
2016-06-30 13.008816
2016-07-31 13.022302
2016-08-31 13.025976
2016-09-30 13.032235
2016-10-31 13.059713
2016-11-30 13.036359
2016-12-31 13.049509
Freq: M, Name: data, dtype: float64
If I predict up to the end of the series (20161231), no error. If I go out of sample (20170131), error.
If I pass an explicit freq argument to my ARIMA, I get a new error that I think is an existing bug ( ValueError: Wrong number of items passed 434, placement implies 435), but I’m more concerned with figuring out this bug.
This only happens for SOME of my data, so I’m not sure what to provide to replicate it. The data comes from the same source, has the same datetimeindex, but sometimes it throws an error about the frequency and can’t recover.
EDIT: Here is a Series that does not throw an error. Note the same datetimeindex.
date
1982-09-30 10.1
1982-10-31 10.4
1982-11-30 10.8
1982-12-31 10.8
1983-01-31 10.4
1983-02-28 10.4
1983-03-31 10.3
...
2016-05-31 4.7
2016-06-30 4.9
2016-07-31 4.9
2016-08-31 4.9
2016-09-30 4.9
2016-10-31 4.8
2016-11-30 4.6
2016-12-31 4.7
Freq: M, Name: data2, dtype: float64
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 4
- Comments: 62 (25 by maintainers)
The underlying problem here is that your data doesn’t have an index with an associated frequency, because your data skips days (for example going from 2016/2/5 to 2016/2/14). That means that for forecasting, you won’t be able to use dates, because we generically can’t tell what the “next” date ought to be if there isn’t an associated frequency.
If you update Statsmodels to v0.9, it will give you better warnings / error messages to this effect (like the warning I just mentioned), but it won’t solve your problem because the underlying issue is what I mentioned above.
You have two options:
You can call
results.predictusing integers forstartandend(e.g.results.predict(start=results.nobs, end=results.nobs + 10)) and then attach whatever dates you like to the resulting forecasts.You can reindex your data to have a date series with daily frequency. For example:
This will mean your new time series will have
NaNs in it, but that’s not a problem forSARIMAX. In fact, it should give you better results, since simply removing missing observations is not the right way to deal with missing observations in models like ARIMA where today’s value depends on yesterday’s value.What version of statsmodels are you using? And please reformat using triple backticks around fixed-width text, like the exception
Resolved.
Digging in the data, there is in fact a missing value in the middle that is causing the issue.