covid19-sir: [Fix] [Forecast] Predicted cases do not increase monotonically - Forecasted Rt negative

Summary

I tried to run the following complete example for Greece and to make a forecast:

import covsirphy as cs

data_loader = cs.DataLoader(directory="kaggle/input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
pcr_data = data_loader.pcr()
oxcgrt_data = data_loader.oxcgrt()
vaccine_data = data_loader.vaccine()

gre_scenario = cs.Scenario(jhu_data, population_data, "Greece")
gre_scenario.interactive = True
gre_records = gre_scenario.records(variables=["Confirmed", "Infected", "Fatal", "Recovered"],
                                   color_dict={"Confirmed": "blue", "Infected": "orange", "Fatal": "red", "Recovered": "green"},
                                   bbox_to_anchor=(0.5, -0.15))
pcr_data.positive_rate("Greece")
_ = gre_scenario.trend()
gre_scenario.estimate(cs.SIRF)
# Add future phase to main scenario
gre_scenario.add(name="Main_30days", days=30)
gre_scenario.fit_predict(oxcgrt_data=oxcgrt_data, name="Forecast").summary(name="Forecast")
_ = gre_scenario.history(target="Confirmed")
_ = gre_scenario.history(target="Infected")
_ = gre_scenario.history(target="Fatal")
_ = gre_scenario.history(target="Recovered")
_ = gre_scenario.history(target="Rt")
# Simulation of the number of cases
gre_sim_df = gre_scenario.simulate(variables=["Confirmed", "Infected", "Fatal", "Recovered"],
                                   color_dict={"Confirmed": "blue", "Infected": "orange", "Fatal": "red", "Recovered": "green"},
                                   name="Main_30days")
gre_sim_df = gre_scenario.simulate(variables=["Confirmed", "Infected", "Fatal", "Recovered"],
                                   color_dict={"Confirmed": "blue", "Infected": "orange", "Fatal": "red", "Recovered": "green"},
                                   name="Forecast")

The result of the simulated cases based on the forecasted parameters (which are the predicted cases), are:

We can clearly see that both the total confirmed decrease in the future phase (17Mar21 - after), which is totally incorrect behavior (also another time/execution the recovered as well decreased). I don’t know if this happens for other countries. It is a good practice, during development, to plot the confirmed cases in the simulations, besides the other three categories or even better by default.

The summary is:

Rt is negative.

If I understand correctly, the current forecast implementation trains data in the plane “parameters-OxCGRT index”. I think this may lead to problematic behaviors, for example to the one presented above, because the train data are only partially objectively correct. The parameter set is the estimated one as is calculated from Scenario.estimate(), which contains obviously a fitting error (trends/phases separation + estimator fitting errors) that gets accumulated and inevitably propagates forward during the .fit_predict().

Wouldn’t it be better if we used only observed variables? Could we use the linear combination of confirmed/fatal/recovered cases (or the daily ones) instead? So ultimately to train data in the plane “C/F/R cases-index”?

This suggestion is essentially the opposite of the current implementation, to predict cases (depending on the OxCGRT index) and fit the model into them in order to estimate the future parameter set (forecasted parameters). But what we currently do is to forecast the parameter set (depending on the OxCGRT index) in order to simulate the future cases (predicted cases).

Sorry if I perhaps don’t remember correctly, but that’s indeed what we currently do right? Do you think this solution could work and lead to improved results? Should I create a new issue?

Environment

CovsirPhy version: v2.17.0-eta
Python version: 3.8.5
Installation: conda/pip
System: Windows

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 22 (14 by maintainers)

Most upvoted comments

I created pull request #712 to add “Indicators(n)/Indicators(n-1) -> Parameters(n)/Parameters(n-1) with Elastic Net” approach to forecasting.

Now we have two apporoaches and the best approaches will be selected with test scores. Approaches which lead un-expected parameter values (i.e. not in range (0, 1)) will not be selected.

lisphilar on Apr 10, 2021