scikit-learn: RuntimeError: "Cannot clone object ..." when using clone with an implementation of BaseEstimator that copies objects in get_params method
Description
RuntimeError thrown when using sklearn.base.clone
with estimators whose implementation of get_params
returns copies instead of references.
Sanity check at the end of the clone
function fails when the implementation of the estimator used, copies parameters during its initialisation or in the get_params
method.
Either the documentation of __init__
and get_params
in BaseEstimator
should indicate that parameters should never be copied or the sanity check in clone
should be more flexible.
In the case that implementations of estimators should not copy parameters, an issue should be created in the Keras project regarding the 'get_params`method of the classes used for integration with scikit-learn.
Steps/Code to Reproduce
from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense
from sklearn.base import clone
def create_keras_classifier_model(n_classes):
"""Keras multinomial logistic regression creation model
Args:
n_classes(int): Number of classes to be classified
Returns:
Compiled keras model
"""
# create model
model = Sequential()
model.add(Dense(n_classes, activation="softmax"))
# Compile model
model.compile(
loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
)
return model
estimator = KerasClassifier(build_fn=create_keras_classifier_model, n_classes=2, class_weight={0: 1, 1:3})
clone(estimator)
Expected Results
No error is thrown.
Actual Results
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-520f4ee6e745> in <module>
26 estimator = KerasClassifier(build_fn=create_keras_classifier_model, n_classes=2, class_weight={0: 1, 1:3})
27
---> 28 clone(estimator)
/usr/local/anaconda/envs/ivan/lib/python3.6/site-packages/sklearn/base.py in clone(estimator, safe)
73 raise RuntimeError('Cannot clone object %s, as the constructor '
74 'either does not set or modifies parameter %s' %
---> 75 (estimator, name))
76 return new_object
77
RuntimeError: Cannot clone object <keras.wrappers.scikit_learn.KerasClassifier object at 0x7f7504148f28>, as the constructor either does not set or modifies parameter class_weight
Versions
System: python: 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0] executable: /usr/local/anaconda/envs/ivan/bin/python machine: Linux-3.10.0-957.21.3.el7.x86_64-x86_64-with-redhat-7.6-Maipo
BLAS: macros: SCIPY_MKL_H=None, HAVE_CBLAS=None lib_dirs: /usr/local/anaconda/envs/ivan/lib cblas_libs: blas, cblas, lapack, pthread, blas, cblas, lapack, blas, cblas, lapack
Python deps: pip: 19.2.3 setuptools: 41.4.0 sklearn: 0.20.3 numpy: 1.17.3 scipy: 1.3.1 Cython: 0.29.13 pandas: 0.24.2
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 24 (9 by maintainers)
This is sklearn bug. You should reduce the version of sklearn:
conda install scikit-learn==0.21.2
@cmarmo I’ve confirmed it’s still an issue in 1.0.2, opened #22857 with an example.
This is still an issue (not with Keras specifically). Related: #15371
For context, I am currently working in a codebase where we rely on the
get_params()
interface to get details about custom estimators. Many of these estimators take in parameter objects, and we make copies of the parameter objects because we would like to remove the unneeded parameters before returning the objects inget_params()
to make it clear what is actually being used. In addition, in some cases we need to mutate these parameters externally and so making a copy avoids problems if multiple estimators reference the same parameter object.I am attempting to work around this via various methods (which is not always an easy process due to how our codebase is structured), but I agree with the OP in this regard:
If I clone an estimator and then mutate one of the parameters for that clone, my expectation would be that it will not affect the original, which would be the case with the strict check.
What is the rationale behind having the equality check at all?
We used to check for equality, but made it more strict in part to simplify the code. Should we consider reverting that for the sake of compressively, @amueller, or should we ask/help Keras get with the program?
I didn’t take the decision of returning a deep copy. I was just using the classes that the Keras package has for integration with scikit-learn and I was getting the error I describe in the issue. I was unsure whether this error was due to the sanity check of the clone function being too strict or an error in the implementation of the classes of the Keras package. I understand the decision of using deep copy is to avoid errors due to a later modification in the set of params returned by the method.
And I agree that the sentence you mention might give a hint about
get_params
not deep copying parameters but the relation is a bit vague. A person wanting to implement an estimator is quite likely to copy parameters in the initialisation andget_params
to protect the parameters of the estimator. This was probably what the developer of that specific class in the Keras package wanted. For this reason I think the documentation should discourage to do so.Is there a reason for just checking parameters by reference and not by value?
If there is a reason for maintaining the current behaviour I still think it would be a good idea to explicitly say that init params shouldn’t be copied or deep copied in the
get_params
method. This information should go in the Cloning section.