scikit-learn: API Freezing estimators

TODO: motivate freezing: pipeline components, calibration, transfer/semisupervised learning

This should probably be a SLEP, but I just want it saved somewhere.

Features required for estimator freezing:

clone must have is isinstance(obj, FrozenModel): return obj (or do so via class polymorphism / singledispatch)
FrozenModel delegates all attribute access (get, set, del) to its wrapped estimator (except where specified)
- hence its estimator cannot be accessible at FrozenModel().estimator but at some more munged name.
FrozenModel has def fit(self, *args, **kwargs): return self
FrozenModel has def fit_transform(self, *args, **kwargs): return fit(self, *args, **kwargs).transform(self, args[0]) (and similar for fit_predict?)
isinstance(freeze(obj), type(obj)) == True and isinstance(freeze(obj), FrozenModel) == True
- since this is determined from type(freeze(obj)) (excluding __instancecheck__, which seems irrelevant), this appears to be the hardest criterion to fulfill
- seems to entail use of a mixin, class created in closure, setting __class__ (!), overloading of __reduce__, help! I think I’ve gone down the wrong path!!
must behave nicely with pickle and copy.[deep]copy
freeze(some_list) will freeze every element of the list

About this issue

Original URL
State: open
Created 7 years ago
Reactions: 5
Comments: 31 (23 by maintainers)

Most upvoted comments

I also have a potential use case for this. I work with forecasts from weather models. For training models I have access to historic weather data (wind speed, say) to use as features. However, when making a real-time forecasts that information is not available, I only have forecasts of those features. I want to be able to fit an estimator on the historical features, freeze it, and then use it within a pipeline to make predictions, without it being refit.

samwisehawkins on Oct 1, 2018

Regarding a static transformer (option 4 referenced above), maybe something like this?

class StaticTransformer(TransformerMixin, BaseEstimator):
    """Predict using a pre-fitted model, acting as a transformer.

    No refitting is done.
    """
    
    def __init__(self, base_model):
        self.base_model = base_model
        self.__base_model_object = joblib.load(self.base_model)
        
    def fit(self, X, y=None):
        return self
    
    def transform(self, X, y=None):
        base_preds = self.__base_model_object.predict(X)
        base_preds = np.expand_dims(base_preds, axis=1)
        return base_preds

The base model that is loaded (frozen model) does not change, and it seems to survive a clone.

nxorable on Dec 6, 2021

I mean that being an instance of the frozen type is not necessarily important as long as it has all attributes of the frozen type.

jnothman on Oct 23, 2019

It is awesome. I often train vectorizer on bigger corpus than labeled training set, so setting trainable=False for pipeline vectorization step will be very helpful. Thank you!

NadiaRom on Dec 12, 2018