sktime: [ENH] conditional forecaster multiplexing by instance property, e.g., slow or fast moving goods

Is your feature request related to a problem? Please describe. Here is the logic that I would like to use with sktime:

  • I have some panel data, where the data is grouped into instances by ‘store’ and ‘product’, and the target value is ‘Sales’
  • I have a special rule that does a different type of forecast based on the product and past sales (it predicts all zeros based on a heuristic that predicts if a store no longer stocks a product)

Now, I have my custom forecaster working well, and its fit method is called with X for each instance, but there’s no way for me to tell which instance I’m currently looking at (that is, which store/product I’m looking at) so I can’t apply any product-specific logic.

Describe the solution you’d like I’ve looked at the code and can’t see how it could be done easily. In this code, group_name would need to be passed through to the estimator method, but of course that would error if it’s not expected. You could inspect the func to see if it’s expected. https://github.com/sktime/sktime/blob/f94a4bf4ed6e7ec1520496ec00ae987c57cc9703/sktime/datatypes/_vectorize.py#L572-L584

Describe alternatives you’ve considered A few workarounds that work just fine:

  • Add a column to X IsProductSomething so the information is in X
  • Put all the logic for whether or not to do a particular type of forecast outside the sktime pipeline, so have a boolean column for JustPredictZero and read that in the forecaster.

These both spread out the logic of conditional forecasting which isn’t ideal.

I could also put the Product column in X (in addition to being in the index), but it’s a string, so then I need to mess around with categories/integer representations, or drop the column before passing to fit, and it needs a different name (else it clashes with the index) so that approach is a bit of a mess.

Additional context This is probably quite a niche problem, I’d be perfectly happy for you to close this. But if at some point the opportunity arises to make this happen that would be neat.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 18 (9 by maintainers)

Commits related to this issue

Most upvoted comments

yes, exactly!

In addition, nice syntactic sugar would be to make the * dunder on param estimator plus forecaster to default to PluginParamsForecaster, but that’s a minor thing imo (and a separate PR/issue).

@fkiraly Happy to contribute this. I think I like your ParamSelector suggestion. Do you mean something like this:

class ParamSelector(BaseParamFitter):

    def __init__(self, param: str, selector: Callable[[pd.DataFrame], str]):
        super().__init__()
        self.param = param
        self.selector = selector

    def _fit(self, X: pd.DataFrame) -> BaseParamFitter:
        setattr(self, self.param, self.selector(X))
        return self

and then:

PluginParamsForecaster(
    param_est=ParamSelector(
        param="selected_forecaster_",
        selector=lambda X: "PredictZero" if X.squeeze().tail(30).sum() < 4 and not X.IsSchoolSupplies[0] else "LagYIntoX",
    ),
    forecaster=MultiplexForecaster(
        [
            ("PredictZero", PredictZero()),
            ("LagYIntoX", LagYIntoX(...)),
        ]
    ),
)

Looks nice and pretty generalisable to me! 🚀