sktime: [BUG] ForecastX is returning wrong predictions
Describe the bug
ForecastX returns weird predictions, when I recreate the composition manually I get different results, see below.
To Reproduce
from sktime.datasets import load_longley
from sktime.forecasting.arima import ARIMA
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ForecastX
from sktime.forecasting.var import VAR
from sktime.forecasting.model_selection import temporal_train_test_split
import pandas as pd
y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=3)
fh = ForecastingHorizon([1, 2, 3])
columns=["ARMED", "POP"]
# ForecastX
pipe = ForecastX(
forecaster_X=VAR(),
forecaster_y=ARIMA(),
columns=columns,
)
pipe = pipe.fit(y_train, X=X_train, fh=fh)
# dropping ["ARMED", "POP"] as those are the columns where we expect not to have future values
y_pred = pipe.predict(fh=fh, X=X_test.drop(columns=columns))
y_pred.to_frame()
<html>
<body>
| 0 – | – -285606.156664 -276594.299226 -267325.354023
</body> </html>Expected behavior
Expected would be following, recreated manually:
# fit y forecaster
arima = ARIMA().fit(y_train, X=X_train)
# fit and predict X forecaster
var = VAR()
var.fit(X_train[columns])
var_pred = var.predict(fh)
# predict y forecaster with predictions from VAR
X_pred = pd.concat([X_test.drop(columns=columns), var_pred], axis=1)
arima.predict(fh=fh, X=X_pred).to_frame()
<html>
<body>
| 0 – | – 70085.541848 70073.339809 72665.367272
</body> </html>This makes more sense if we look into what plain ARIMA is forecasting:
arima = ARIMA()
arima.fit(y=y_train, X=X_train, fh=fh)
arima.predict(X=X_test).to_frame()
<html>
<body>
| 0 – | – 70372.810378 70873.888217 73678.534405
</body> </html>Additional context
I am wonderind how I can use ForecastX in evaluate? The X will be always given with all columns but I want to be sure that ForecastX is overwriting the given columns in X in predict so that forecaster_y gets the predictions of forecaster_X appended with the other columns as input (see above where I reproduced the ForecastX coposition manually and use append). So even if X is given in predict with all columns it should overwrite the columns specified in ForecastX(columns=[...])- is that already like that?
Versions
System: python: 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] executable: /local_disk0/.ephemeral_nfs/envs/pythonEnv-8a6aed3a-d844-4ca9-bc34-194086d0db32/bin/python machine: Linux-5.4.0-1090-azure-x86_64-with-glibc2.29
Python dependencies: pip: 21.0.1 setuptools: 52.0.0 sklearn: 0.24.1 sktime: 0.15.0 statsmodels: 0.13.5 numpy: 1.22.4 scipy: 1.6.2 pandas: 1.2.4 matplotlib: 3.4.2 joblib: 1.0.1 numba: 0.56.4 pmdarima: 2.0.2 tsfresh: None
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (7 by maintainers)
let’s not have this discussion here, this is going off-topic.