sktime: [BUG] UnobservedComponents throws error during predict() call, when wrapped under TransformedTargetForecaster and ForecastingPipeline
Describe the bug
Initially identified issue:
New version of sktime throws error for UnobservedComponents when wrapped under ForecastingGridSearchCV with TransformedTargetForecaster pipe.
Updated description:
From sktime v.0.11.1 onwards untill the current version of main, UnobservedComponents throws error during Predict method call when it is piped under TransformedTargetForecaster(). The same construct works with all the other models which I could test for example ARIMA, AutoETS etc. So the issue is how TransformedTargetForecaster calls predict and the cascading of the same method to UnobservedComponents class.
To Reproduce The code example was taken from documentation of ForecastingGridSearchCV for advanced example. The only thing changed was calling UnobservedComponents instead of ExponentialSmoothing at the end of param_grid argument in gscv.
from sktime.datasets import load_shampoo_sales
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.model_selection import ExpandingWindowSplitter
from sktime.forecasting.model_selection import ForecastingGridSearchCV
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.forecasting.theta import ThetaForecaster
from sktime.transformations.series.impute import Imputer
from sktime.forecasting.structural import UnobservedComponents
y = load_shampoo_sales()
fh = [1,2,3]
pipe = TransformedTargetForecaster(steps=[
("imputer", Imputer()),
("forecaster", UnobservedComponents())])
cv = ExpandingWindowSplitter(
initial_window=24,
step_length=12,
start_with_window=True,
fh=[1,2,3])
gscv = ForecastingGridSearchCV(
forecaster=pipe,
param_grid=[{
"forecaster": [NaiveForecaster(sp=12)],
"forecaster__strategy": ["drift", "last", "mean"],
},
{
"imputer__method": ["mean", "drift"],
"forecaster": [ThetaForecaster(sp=12)],
},
{
"imputer__method": ["mean", "last"],
"forecaster": [UnobservedComponents()],
"forecaster__seasonal": [12],
},
],
cv=cv,
n_jobs=-1)
gscv.fit(y)
y_pred = gscv.predict(fh)
Expected behavior
Additional context Error report:
TypeError Traceback (most recent call last)
~/work/chronos/chronos/pipeline/tests/test_pipeline.py in <module>
37 cv=cv,
38 n_jobs=-1)
---> 39 gscv.fit(y)
40
41 y_pred = gscv.predict(fh)
~/work/sktime/sktime/forecasting/base/_base.py in fit(self, y, X, fh)
262 # we call the ordinary _fit if no looping/vectorization needed
263 if not vectorization_needed:
--> 264 self._fit(y=y_inner, X=X_inner, fh=fh)
265 else:
266 # otherwise we call the vectorized version of fit
~/work/sktime/sktime/forecasting/model_selection/_tune.py in _fit(self, y, X, fh)
279
280 # Run grid-search cross-validation.
--> 281 results = self._run_search(evaluate_candidates)
282
283 results = pd.DataFrame(results)
~/work/sktime/sktime/forecasting/model_selection/_tune.py in _run_search(self, evaluate_candidates)
487 """Search all candidates in param_grid."""
488 _check_param_grid(self.param_grid)
--> 489 return evaluate_candidates(ParameterGrid(self.param_grid))
490
491 @classmethod
~/work/sktime/sktime/forecasting/model_selection/_tune.py in evaluate_candidates(candidate_params)
266
267 out = parallel(
--> 268 delayed(_fit_and_score)(params) for params in candidate_params
269 )
270
~/opt/miniconda3/envs/chronos_dev2/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
1054
1055 with self._backend.retrieval_context():
-> 1056 self.retrieve()
1057 # Make sure that we get a last message telling us we are done
1058 elapsed_time = time.time() - self._start_time
~/opt/miniconda3/envs/chronos_dev2/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
933 try:
934 if getattr(self._backend, 'supports_timeout', False):
--> 935 self._output.extend(job.get(timeout=self.timeout))
936 else:
937 self._output.extend(job.get())
~/opt/miniconda3/envs/chronos_dev2/lib/python3.7/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e
~/opt/miniconda3/envs/chronos_dev2/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
433 raise CancelledError()
434 elif self._state == FINISHED:
--> 435 return self.__get_result()
436 else:
437 raise TimeoutError()
~/opt/miniconda3/envs/chronos_dev2/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
TypeError: No valid mtype could be identified
Versions System: python: 3.7.13 (default, Mar 28 2022, 07:24:34) [Clang 12.0.0 ] executable: …/miniconda3/envs/chronos_dev2/bin/python machine: Darwin-21.4.0-x86_64-i386-64bit
Python dependencies: pip: 21.2.2 setuptools: 58.0.4 sklearn: 1.0.2 sktime: 0.11.3 statsmodels: 0.12.1 numpy: 1.21.5 scipy: 1.7.3 pandas: 1.3.5 matplotlib: 3.5.1 joblib: 1.1.0 numba: 0.55.1 pmdarima: 1.8.5 tsfresh: None
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (3 by maintainers)
found the problem!
Combination of two issues:
pd.DataFrameconstructor. When called aspd.DataFrame(my_series, columns=["col_name"]), this will produce an empty data frame ifmy_serieshas a name, otherwise apd.DataFramewith column name"col_name"UnobserveComponents._predictproduces a non-conformantpd.Seriesas a return, in that it has a name (predicted_mean), as opposed to not having a nameI think we need to do two things:
pd.Seriestopd.DataFrameeven when the series has a namepredict, and only for certain scenarios.Hey! I think I joined the party a bit late. This “issue” (un-expected behaviour) of the
pd.DataFrameconstructor was a surprise for me as well 🙈 !With respect to the zero values, this is because the comment above by @indinewton . Here is an example:
This is consistent with the
statsmodelsimplementation:@fkiraly yeah, in example above it should produce 0 as forecasts, because no parameters were given to initiate UC. And I just run the same using 0.11.0 version of sktime and it generated this
Note how the Name of column has been changed to “predicted_mean”.
it seems to produce zero in all cases where it previously broke.
But, the problem is not with the conversion, I tested that, it’s already coming zero out of the inner
_predict.I would hence assume somewhere within
UnobservedComponentsthere is a potential issue of similar kind, but it’s a different bug.@juanitorduz, can you help, perhaps?
so, two questions:
pd.Indexcreated?The place the error is raised is at the very end, i.e., just before we would return the output of the
TransformedTargetForecaster, i.e., all predicts and transforms internally have been executed and have raised no similar error.