sktime: [BUG] use of LabelEncoder leads to failure of scitype

Reproducible Example

import numpy
import pandas
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sktime.forecasting.arima import ARIMA
from sktime.transformations.series.subset import ColumnSelect

different_series = ["A", "B"]
different_dates = pandas.date_range(start="2000-01-01", end="2000-03-31", freq="D")

number_of_series = len(different_series)
number_of_timesteps = len(different_dates)

random_generator = numpy.random.default_rng(seed=0)

different_values = [1, 2, 3]

sample_data = {
    "series": numpy.repeat(different_series, number_of_timesteps),
    "dates": numpy.tile(different_dates, number_of_series),
    "P": random_generator.standard_normal(size=number_of_series * number_of_timesteps),
    "Q": random_generator.choice(
        different_values, size=number_of_series * number_of_timesteps, replace=True
    ),
    "R": random_generator.standard_exponential(size=number_of_series * number_of_timesteps),
}

sample_dataset = pandas.DataFrame(data=sample_data)
sample_dataset = sample_dataset.set_index(["series", "dates"])

pipeline = (ColumnSelect(columns=["Q"]) * LabelEncoder()) ** (StandardScaler() * ARIMA())
pipeline.fit(sample_dataset[["P"]], X=sample_dataset[["Q", "R"]])

Error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/environment/lib/python3.10/site-packages/sktime/forecasting/base/_base.py", line 380, in fit
    self._fit(y=y_inner, X=X_inner, fh=fh)
  File "/path/to/environment/lib/python3.10/site-packages/sktime/forecasting/compose/_pipeline.py", line 494, in _fit
    X = t.fit_transform(X=X, y=y)
  File "/path/to/environment/lib/python3.10/site-packages/sktime/transformations/base.py", line 658, in fit_transform
    return self.fit(X, y).transform(X, y)
  File "/path/to/environment/lib/python3.10/site-packages/sktime/transformations/base.py", line 458, in fit
    X_inner, y_inner = self._check_X_y(X=X, y=y)
  File "/path/to/environment/lib/python3.10/site-packages/sktime/transformations/base.py", line 1115, in _check_X_y
    iterate_y = _most_complex_scitype(y_inner_scitype, y_scitype)
  File "/path/to/environment/lib/python3.10/site-packages/sktime/transformations/base.py", line 942, in _most_complex_scitype
    return _most_complex_scitype(scitypes)
  File "/path/to/environment/lib/python3.10/site-packages/sktime/transformations/base.py", line 944, in _most_complex_scitype
    raise ValueError("no series scitypes supported, bug in estimator")
ValueError: no series scitypes supported, bug in estimator

Expectation

I expected that ARIMA will be fitted with two regressor: Q and R, where Q is the encoded version of original Q.

Version

Operating System: macOS Sonoma 14.3 Python: 3.10.12 sktime: 0.25.1

About this issue

Original URL
State: open
Created 5 months ago
Comments: 24

Commits related to this issue

[ENH] improved output type checking error messages in `BaseTransformer.transform` (#5921) Improves output type checking error messages in `BaseTransformer.transform`, using idiomatic `check_is_error... — committed to sktime/sktime by fkiraly 4 months ago
[ENH] improved output type checking error messages in `BaseTransformer.transform` (#5921) Improves output type checking error messages in `BaseTransformer.transform`, using idiomatic `check_is_error... — committed to tiloye/sktime by fkiraly 4 months ago

Most upvoted comments

I’d definitely want to add this support.

So, please go ahead! Remove the check and add your case as test case, perhaps more.

fkiraly on Jan 31, 2024