scikit-learn: API Freezing estimators

TODO: motivate freezing: pipeline components, calibration, transfer/semisupervised learning

This should probably be a SLEP, but I just want it saved somewhere.

Features required for estimator freezing:

  • clone must have is isinstance(obj, FrozenModel): return obj (or do so via class polymorphism / singledispatch)
  • FrozenModel delegates all attribute access (get, set, del) to its wrapped estimator (except where specified)
    • hence its estimator cannot be accessible at FrozenModel().estimator but at some more munged name.
  • FrozenModel has def fit(self, *args, **kwargs): return self
  • FrozenModel has def fit_transform(self, *args, **kwargs): return fit(self, *args, **kwargs).transform(self, args[0]) (and similar for fit_predict?)
  • isinstance(freeze(obj), type(obj)) == True and isinstance(freeze(obj), FrozenModel) == True
    • since this is determined from type(freeze(obj)) (excluding __instancecheck__, which seems irrelevant), this appears to be the hardest criterion to fulfill
    • seems to entail use of a mixin, class created in closure, setting __class__ (!), overloading of __reduce__, help! I think I’ve gone down the wrong path!!
  • must behave nicely with pickle and copy.[deep]copy
  • freeze(some_list) will freeze every element of the list

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 5
  • Comments: 31 (23 by maintainers)

Most upvoted comments

I also have a potential use case for this. I work with forecasts from weather models. For training models I have access to historic weather data (wind speed, say) to use as features. However, when making a real-time forecasts that information is not available, I only have forecasts of those features. I want to be able to fit an estimator on the historical features, freeze it, and then use it within a pipeline to make predictions, without it being refit.

Regarding a static transformer (option 4 referenced above), maybe something like this?

class StaticTransformer(TransformerMixin, BaseEstimator):
    """Predict using a pre-fitted model, acting as a transformer.

    No refitting is done.
    """
    
    def __init__(self, base_model):
        self.base_model = base_model
        self.__base_model_object = joblib.load(self.base_model)
        
    def fit(self, X, y=None):
        return self
    
    def transform(self, X, y=None):
        base_preds = self.__base_model_object.predict(X)
        base_preds = np.expand_dims(base_preds, axis=1)
        return base_preds

The base model that is loaded (frozen model) does not change, and it seems to survive a clone.

I mean that being an instance of the frozen type is not necessarily important as long as it has all attributes of the frozen type.

It is awesome. I often train vectorizer on bigger corpus than labeled training set, so setting trainable=False for pipeline vectorization step will be very helpful. Thank you!