sktime: [BUG] ForecastX is returning wrong predictions

Describe the bug ForecastX returns weird predictions, when I recreate the composition manually I get different results, see below.

To Reproduce

from sktime.datasets import load_longley
from sktime.forecasting.arima import ARIMA
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ForecastX
from sktime.forecasting.var import VAR
from sktime.forecasting.model_selection import temporal_train_test_split
import pandas as pd

y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=3)
fh = ForecastingHorizon([1, 2, 3])
columns=["ARMED", "POP"]

# ForecastX
pipe = ForecastX(  
    forecaster_X=VAR(),
    forecaster_y=ARIMA(),
    columns=columns,
)
pipe = pipe.fit(y_train, X=X_train, fh=fh) 
# dropping ["ARMED", "POP"] as those are the columns where we expect not to have future values
y_pred = pipe.predict(fh=fh, X=X_test.drop(columns=columns))
y_pred.to_frame()
<html> <body>

| 0 – | – -285606.156664 -276594.299226 -267325.354023

</body> </html>

Expected behavior

Expected would be following, recreated manually:

# fit y forecaster
arima = ARIMA().fit(y_train, X=X_train)

# fit and predict X forecaster
var = VAR()
var.fit(X_train[columns])
var_pred = var.predict(fh)

# predict y forecaster with predictions from VAR
X_pred = pd.concat([X_test.drop(columns=columns), var_pred], axis=1)
arima.predict(fh=fh, X=X_pred).to_frame()
<html> <body>

| 0 – | – 70085.541848 70073.339809 72665.367272

</body> </html>

This makes more sense if we look into what plain ARIMA is forecasting:

arima = ARIMA()
arima.fit(y=y_train, X=X_train, fh=fh)
arima.predict(X=X_test).to_frame()
<html> <body>

| 0 – | – 70372.810378 70873.888217 73678.534405

</body> </html>

Additional context I am wonderind how I can use ForecastX in evaluate? The X will be always given with all columns but I want to be sure that ForecastX is overwriting the given columns in X in predict so that forecaster_y gets the predictions of forecaster_X appended with the other columns as input (see above where I reproduced the ForecastX coposition manually and use append). So even if X is given in predict with all columns it should overwrite the columns specified in ForecastX(columns=[...])- is that already like that?

Versions

System: python: 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] executable: /local_disk0/.ephemeral_nfs/envs/pythonEnv-8a6aed3a-d844-4ca9-bc34-194086d0db32/bin/python machine: Linux-5.4.0-1090-azure-x86_64-with-glibc2.29

Python dependencies: pip: 21.0.1 setuptools: 52.0.0 sklearn: 0.24.1 sktime: 0.15.0 statsmodels: 0.13.5 numpy: 1.22.4 scipy: 1.6.2 pandas: 1.2.4 matplotlib: 3.4.2 joblib: 1.0.1 numba: 0.56.4 pmdarima: 2.0.2 tsfresh: None

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (7 by maintainers)

Commits related to this issue

Most upvoted comments

let’s not have this discussion here, this is going off-topic.