scikit-learn: API Freezing estimators
TODO: motivate freezing: pipeline components, calibration, transfer/semisupervised learning
This should probably be a SLEP, but I just want it saved somewhere.
Features required for estimator freezing:
clone
must haveis isinstance(obj, FrozenModel): return obj
(or do so via class polymorphism /singledispatch
)FrozenModel
delegates all attribute access (get, set, del) to its wrapped estimator (except where specified)- hence its estimator cannot be accessible at
FrozenModel().estimator
but at some more munged name.
- hence its estimator cannot be accessible at
FrozenModel
hasdef fit(self, *args, **kwargs): return self
FrozenModel
hasdef fit_transform(self, *args, **kwargs): return fit(self, *args, **kwargs).transform(self, args[0])
(and similar forfit_predict
?)isinstance(freeze(obj), type(obj)) == True
andisinstance(freeze(obj), FrozenModel) == True
- since this is determined from
type(freeze(obj))
(excluding__instancecheck__
, which seems irrelevant), this appears to be the hardest criterion to fulfill - seems to entail use of a mixin, class created in closure, setting
__class__
(!), overloading of__reduce__
, help! I think I’ve gone down the wrong path!!
- since this is determined from
- must behave nicely with
pickle
andcopy.[deep]copy
freeze(some_list)
will freeze every element of the list
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 5
- Comments: 31 (23 by maintainers)
I also have a potential use case for this. I work with forecasts from weather models. For training models I have access to historic weather data (wind speed, say) to use as features. However, when making a real-time forecasts that information is not available, I only have forecasts of those features. I want to be able to fit an estimator on the historical features, freeze it, and then use it within a pipeline to make predictions, without it being refit.
Regarding a static transformer (option 4 referenced above), maybe something like this?
The base model that is loaded (frozen model) does not change, and it seems to survive a
clone
.I mean that being an instance of the frozen type is not necessarily important as long as it has all attributes of the frozen type.
It is awesome. I often train vectorizer on bigger corpus than labeled training set, so setting
trainable=False
for pipeline vectorization step will be very helpful. Thank you!