scikit-learn: All tests in `check_estimator` are disabled if `X_types` does not include `"2darray"`
Description
All tests in in check_estimator
are disabled if X_types
does not include "2darray"
. Some tests such as check_get_params_invariance
or check_set_params
do not require a fit
step, and should be run no matter what.
I am currently building a library that uses its own data types, but tries to be Scikit-learn compatible. Thus, I try to make estimators with the same API as Scikit-learn, but different input and output types. It would be useful to check API conformance using check_estimator
with our estimator objects.
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 19 (19 by maintainers)
If you could go through all the tests we have and see what we create and see how that compares to what you could generate, would be nice.
Well, that would definitely be a pretty convenient way for our use-case. Should I list the different factories that would be required?
#16756 is one of the places we’re discussing it. Still a bit in brainstorming stage, but will be working on it soon.
Hi there,
Not sure if I should post that here or in #6715 …
As mentioned by @rth before, we have a similar context in
tslearn
: our input data (X) are 3d arrays (time series datasets of shape(n_time_series, n_timestamps, n_features)
). We are willing to be sklearn-compatible for the same reasons as @vnmabus .At the moment, our solution consists in monkey-patching sklearn checks in our lib (eg.
check_clustering
there). One typical issue we are facing is that some of the tests performed by sklearn use hard-coded datasets. As an example, at the moment,check_clustering
checks whether the estimator can actually cluster data that are generated bymake_blobs
and ensure a minimum level of accuracy (adjusted Rand score greater than .4). It could be very beneficial for packages that depend on sklearn to have a way to define the datasets to be used for checking.I think the following common tests should run for any
X_types
: