sktime: [BUG] use of LabelEncoder leads to failure of scitype
Reproducible Example
import numpy
import pandas
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sktime.forecasting.arima import ARIMA
from sktime.transformations.series.subset import ColumnSelect
different_series = ["A", "B"]
different_dates = pandas.date_range(start="2000-01-01", end="2000-03-31", freq="D")
number_of_series = len(different_series)
number_of_timesteps = len(different_dates)
random_generator = numpy.random.default_rng(seed=0)
different_values = [1, 2, 3]
sample_data = {
"series": numpy.repeat(different_series, number_of_timesteps),
"dates": numpy.tile(different_dates, number_of_series),
"P": random_generator.standard_normal(size=number_of_series * number_of_timesteps),
"Q": random_generator.choice(
different_values, size=number_of_series * number_of_timesteps, replace=True
),
"R": random_generator.standard_exponential(size=number_of_series * number_of_timesteps),
}
sample_dataset = pandas.DataFrame(data=sample_data)
sample_dataset = sample_dataset.set_index(["series", "dates"])
pipeline = (ColumnSelect(columns=["Q"]) * LabelEncoder()) ** (StandardScaler() * ARIMA())
pipeline.fit(sample_dataset[["P"]], X=sample_dataset[["Q", "R"]])
Error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/path/to/environment/lib/python3.10/site-packages/sktime/forecasting/base/_base.py", line 380, in fit
self._fit(y=y_inner, X=X_inner, fh=fh)
File "/path/to/environment/lib/python3.10/site-packages/sktime/forecasting/compose/_pipeline.py", line 494, in _fit
X = t.fit_transform(X=X, y=y)
File "/path/to/environment/lib/python3.10/site-packages/sktime/transformations/base.py", line 658, in fit_transform
return self.fit(X, y).transform(X, y)
File "/path/to/environment/lib/python3.10/site-packages/sktime/transformations/base.py", line 458, in fit
X_inner, y_inner = self._check_X_y(X=X, y=y)
File "/path/to/environment/lib/python3.10/site-packages/sktime/transformations/base.py", line 1115, in _check_X_y
iterate_y = _most_complex_scitype(y_inner_scitype, y_scitype)
File "/path/to/environment/lib/python3.10/site-packages/sktime/transformations/base.py", line 942, in _most_complex_scitype
return _most_complex_scitype(scitypes)
File "/path/to/environment/lib/python3.10/site-packages/sktime/transformations/base.py", line 944, in _most_complex_scitype
raise ValueError("no series scitypes supported, bug in estimator")
ValueError: no series scitypes supported, bug in estimator
Expectation
I expected that ARIMA will be fitted with two regressor: Q and R, where Q is the encoded version of original Q.
Version
Operating System: macOS Sonoma 14.3 Python: 3.10.12 sktime: 0.25.1
About this issue
- Original URL
- State: open
- Created 5 months ago
- Comments: 24
Commits related to this issue
- [ENH] improved output type checking error messages in `BaseTransformer.transform` (#5921) Improves output type checking error messages in `BaseTransformer.transform`, using idiomatic `check_is_error... — committed to sktime/sktime by fkiraly 4 months ago
- [ENH] improved output type checking error messages in `BaseTransformer.transform` (#5921) Improves output type checking error messages in `BaseTransformer.transform`, using idiomatic `check_is_error... — committed to tiloye/sktime by fkiraly 4 months ago
So, please go ahead! Remove the check and add your case as test case, perhaps more.