scikit-learn: All tests in `check_estimator` are disabled if `X_types` does not include `"2darray"`

Description

All tests in in check_estimator are disabled if X_types does not include "2darray". Some tests such as check_get_params_invariance or check_set_params do not require a fit step, and should be run no matter what.

I am currently building a library that uses its own data types, but tries to be Scikit-learn compatible. Thus, I try to make estimators with the same API as Scikit-learn, but different input and output types. It would be useful to check API conformance using check_estimator with our estimator objects.

About this issue

Original URL
State: open
Created 5 years ago
Comments: 19 (19 by maintainers)

Most upvoted comments

If you could go through all the tests we have and see what we create and see how that compares to what you could generate, would be nice.

adrinjalali on Apr 1, 2020

Well, that would definitely be a pretty convenient way for our use-case. Should I list the different factories that would be required?

rtavenar on Apr 1, 2020

#16756 is one of the places we’re discussing it. Still a bit in brainstorming stage, but will be working on it soon.

adrinjalali on Mar 31, 2020

Hi there,

Not sure if I should post that here or in #6715 …

As mentioned by @rth before, we have a similar context in tslearn: our input data (X) are 3d arrays (time series datasets of shape (n_time_series, n_timestamps, n_features)). We are willing to be sklearn-compatible for the same reasons as @vnmabus .

At the moment, our solution consists in monkey-patching sklearn checks in our lib (eg. check_clustering there). One typical issue we are facing is that some of the tests performed by sklearn use hard-coded datasets. As an example, at the moment, check_clustering checks whether the estimator can actually cluster data that are generated by make_blobs and ensure a minimum level of accuracy (adjusted Rand score greater than .4). It could be very beneficial for packages that depend on sklearn to have a way to define the datasets to be used for checking.

rtavenar on Mar 29, 2020

I think the following common tests should run for any X_types:

check_parameters_default_constructible
check_no_attributes_set_in_init
check_estimators_pickle (might be difficult if we don't know how to fit the estimator)
check_get_params_invariance
check_set_params

rth on Nov 4, 2019