scikit-learn: RuntimeError: "Cannot clone object ..." when using clone with an implementation of BaseEstimator that copies objects in get_params method

Description

RuntimeError thrown when using sklearn.base.clone with estimators whose implementation of get_params returns copies instead of references.

Sanity check at the end of the clone function fails when the implementation of the estimator used, copies parameters during its initialisation or in the get_params method.

Either the documentation of __init__ and get_params in BaseEstimator should indicate that parameters should never be copied or the sanity check in clone should be more flexible.

In the case that implementations of estimators should not copy parameters, an issue should be created in the Keras project regarding the 'get_params`method of the classes used for integration with scikit-learn.

Steps/Code to Reproduce

from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense
 
from sklearn.base import clone
 
def create_keras_classifier_model(n_classes):
    """Keras multinomial logistic regression creation model
 
    Args:
        n_classes(int): Number of classes to be classified
 
    Returns:
        Compiled keras model
 
    """
    # create model
    model = Sequential()
    model.add(Dense(n_classes, activation="softmax"))
    # Compile model
    model.compile(
        loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
    )
    return model
 
estimator = KerasClassifier(build_fn=create_keras_classifier_model, n_classes=2, class_weight={0: 1, 1:3})
 
clone(estimator)

Expected Results

No error is thrown.

Actual Results

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-520f4ee6e745> in <module>
     26 estimator = KerasClassifier(build_fn=create_keras_classifier_model, n_classes=2, class_weight={0: 1, 1:3})
    27
---> 28 clone(estimator)
 
/usr/local/anaconda/envs/ivan/lib/python3.6/site-packages/sklearn/base.py in clone(estimator, safe)
     73             raise RuntimeError('Cannot clone object %s, as the constructor '
     74                                'either does not set or modifies parameter %s' %
---> 75                                (estimator, name))
     76     return new_object
     77
 
RuntimeError: Cannot clone object <keras.wrappers.scikit_learn.KerasClassifier object at 0x7f7504148f28>, as the constructor either does not set or modifies parameter class_weight

Versions

System: python: 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0] executable: /usr/local/anaconda/envs/ivan/bin/python machine: Linux-3.10.0-957.21.3.el7.x86_64-x86_64-with-redhat-7.6-Maipo

BLAS: macros: SCIPY_MKL_H=None, HAVE_CBLAS=None lib_dirs: /usr/local/anaconda/envs/ivan/lib cblas_libs: blas, cblas, lapack, pthread, blas, cblas, lapack, blas, cblas, lapack

Python deps: pip: 19.2.3 setuptools: 41.4.0 sklearn: 0.20.3 numpy: 1.17.3 scipy: 1.3.1 Cython: 0.29.13 pandas: 0.24.2

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 24 (9 by maintainers)

Most upvoted comments

This is sklearn bug. You should reduce the version of sklearn:

conda install scikit-learn==0.21.2

@cmarmo I’ve confirmed it’s still an issue in 1.0.2, opened #22857 with an example.

This is still an issue (not with Keras specifically). Related: #15371

For context, I am currently working in a codebase where we rely on the get_params() interface to get details about custom estimators. Many of these estimators take in parameter objects, and we make copies of the parameter objects because we would like to remove the unneeded parameters before returning the objects in get_params() to make it clear what is actually being used. In addition, in some cases we need to mutate these parameters externally and so making a copy avoids problems if multiple estimators reference the same parameter object.

I am attempting to work around this via various methods (which is not always an easy process due to how our codebase is structured), but I agree with the OP in this regard:

Either the documentation of __init__ and get_params in BaseEstimator should indicate that parameters should never be copied or the sanity check in clone should be more flexible.

If I clone an estimator and then mutate one of the parameters for that clone, my expectation would be that it will not affect the original, which would be the case with the strict check.

What is the rationale behind having the equality check at all?

We used to check for equality, but made it more strict in part to simplify the code. Should we consider reverting that for the sake of compressively, @amueller, or should we ask/help Keras get with the program?

I didn’t take the decision of returning a deep copy. I was just using the classes that the Keras package has for integration with scikit-learn and I was getting the error I describe in the issue. I was unsure whether this error was due to the sanity check of the clone function being too strict or an error in the implementation of the classes of the Keras package. I understand the decision of using deep copy is to avoid errors due to a later modification in the set of params returned by the method.

And I agree that the sentence you mention might give a hint about get_params not deep copying parameters but the relation is a bit vague. A person wanting to implement an estimator is quite likely to copy parameters in the initialisation and get_params to protect the parameters of the estimator. This was probably what the developer of that specific class in the Keras package wanted. For this reason I think the documentation should discourage to do so.

Is there a reason for just checking parameters by reference and not by value?

If there is a reason for maintaining the current behaviour I still think it would be a good idea to explicitly say that init params shouldn’t be copied or deep copied in the get_params method. This information should go in the Cloning section.