prophet: Logistic Floor/Cap not being respected?

Pardon me in advance for being new to Prophet (and forecasting in general). I am trying to fit a curve for approximately 2 years worth of datetime data posted here, and forecasting a year’s worth of 5-minute intervals.

Whether I do a linear or logistic fitting, I do get some outrageous values that are larger or smaller than any of the inputted values (linear even produces negative values). I tried to leverage a logistic model to specify a floor and cap, but that didn’t stop deviant numbers either.

library(prophet)
library(dplyr)

df <- read.csv('http://bit.ly/2po0xPJ') %>% mutate(y=log(y))

#floor and cap
lower = quantile(df$y, .05)
upper = quantile(df$y, .95)

df <- df %>% mutate(floor = lower, cap = upper)

# modeling
m <- prophet(df, 
             changepoint.prior.scale=0.01, 
             growth = 'logistic')

future <- make_future_dataframe(m, periods = (24*12*365), 
                                freq = 60 * 5, 
                                include_history = FALSE)  %>% mutate(floor = lower, cap = upper)
# forecast every 5 minutes
forecast <- predict(m,future)
#prophet_plot_components(m, forecast)

write.csv(forecast %>% 
            select(ds, yhat_lower, yhat_upper, yhat) %>% 
            mutate(floor = exp(lower), 
                   cap = exp(upper),
                   ln_yhat_lower = yhat_lower,
                   ln_yhat_upper = yhat_upper, 
                   ln_yhat = yhat,
                   ln_floor = lower, 
                   ln_cap = upper,
                   yhat_lower = exp(yhat_lower), 
                   yhat_upper = exp(yhat_upper), 
                   yhat = exp(yhat)
                   ), 'problem_output.csv')

For instance, of the 105120 forecasted records outputted, 25131 are lower than the specified floor. Can someone please tell me what I’m doing wrong? Or if there is an expected behavior I’m not interpreting correctly?

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 7
Comments: 19 (7 by maintainers)

Most upvoted comments

Hi, I’m trying to forecast daily transactions and was wondering if there’s a way to put minimum threshold for seasonalities or if that makes sense at all. I fit the model on two years of data using logistic growth with floor= 0 and predicting for the next year. I’m getting positive trend , however, because of negative seasonalities I’m still getting negative forecasts for transactions:

Thanks a lot for your help in advance! P.S. is it possible/sensible to put floor on confidence bands?

shaidams64 on Jun 14, 2018

Thanks for the clean repro. This is a case of bad model fit due to the daily seasonality overfitting. If you plot the forecast with plot(m, forecast) you can see that it looks like this:

prophet_plot

You can see that the in-sample fit seems pretty reasonable, but then the forecasted values are bad. You can see what is happening if you look at the components plot, with prophet_plot_components(m, forecast):

prophet_components1

You can see that the daily seasonality has enormous swings of +/- 2 in the afternoon, which is what is messing up the forecast. The reason this is happening is because if you look at df$ds there are no data with time greater than 12:59:00. With no data in the afternoon, the daily seasonality is being fit poorly there. There’s some description of this happening with monthly data in the documentation her: https://facebook.github.io/prophet/docs/non-daily_data.html .

I’m wondering if this is an artifact of a bad conversion to 24-hour time. But if there really is only times less than 12:59:00, then there are three things you can do to resolve this issue with the daily seasonality:

Only make predictions for seasonal areas that you have data, so, filter any times >12:59:00 from the future dataframe that you make predictions on.
Remove the daily seasonality: m <- prophet(df, changepoint.prior.scale=0.01, growth = 'logistic', daily.seasonality = FALSE).
Use add_seasonality to add a daily seasonality with a stronger prior (smaller prior.scale).

I can imagine this issue coming up more frequently with sub-daily data, we should add better documentation of this behavior.

bletham on Apr 10, 2018

As a side note, the reason that with this bad seasonality the forecast goes outside the upper and lower bounds is because the upper and lower bounds are for the trend; Seasonal fluctuations will allow the forecast value to go outside those bounds.

bletham on Apr 10, 2018

@deniznoah You might want to read this https://math.stackexchange.com/questions/2687851/what-does-ln-accomplish-on-a-regression-input/2688642#2688642

And this if you need a better understanding of Euler’s number and natural logarithms. https://www.youtube.com/watch?v=m2MIpDrF7Es

thomasnield on May 30, 2018

Well, like I said we can get fluctuations outside of lower/upper with the seasonality. But it would be nice to have some check for this type of situation where a big part of the seasonality has no data. At the very least there should be documentation about this, so I’m going to leave this task open for that.

bletham on Apr 13, 2018

@shaidams64 The floor/cap is just for the trend, and as you’ve observed the seasonality can push the forecast outside that.

Like @deniznoah suggested you could fit the model to the log of your data (with no floor), and then take the exp() of yhat and that would ensure positivity. It does, however, induce a different trend model and the exp() can sometimes be a bit sensitive to small changes in the history. So it might work or might not, you’d just have to see.

Multiplicative seasonality would also drive the seasonality to 0 as the trend goes to 0.

bletham on Jun 15, 2018

Documentation now describes this issue in https://facebook.github.io/prophet/docs/non-daily_data.html , so I’ll go ahead and close.

bletham on Jun 7, 2018

@deniznoah But in Python and R, log() is precisely e^x. On these platforms, log() uses base e raised to x power.

thomasnield on May 30, 2018