scikit-learn: Meta-issue: accelerate the slowest running tests

When running the current test suite on azure, pytest reports the following 20 slowest tests:

  • 15.40s call ensemble/tests/test_voting.py::test_gridsearch #21422
  • 11.31s call tests/test_common.py::test_estimators[SequentialFeatureSelector(estimator=LogisticRegression(C=1))-check_estimator_sparse_data] #21515
  • 8.96s call svm/tests/test_svm.py::test_svc_ovr_tie_breaking[NuSVC] #21443
  • 7.40s call utils/tests/test_estimator_checks.py::test_check_estimator_clones #21498
  • 6.49s call ensemble/tests/test_bagging.py::test_classification #21476
  • 5.52s call ensemble/tests/test_common.py::test_ensemble_heterogeneous_estimators_behavior[stacking-classifier] #21562
  • 5.19s call ensemble/tests/test_common.py::test_ensemble_heterogeneous_estimators_behavior[stacking-regressor] #21562
  • 4.41s call linear_model/tests/test_quantile.py::test_asymmetric_error[0.2] #21546
  • 4.12s call ensemble/tests/test_gradient_boosting.py::test_gradient_boosting_early_stopping #21903
  • 4.12s call linear_model/tests/test_quantile.py::test_asymmetric_error[0.8] #21546
  • 3.91s call ensemble/tests/test_bagging.py::test_oob_score_removed_on_warm_start #21892
  • 3.86s call tests/test_common.py::test_estimators[RFECV(estimator=LogisticRegression(C=1))-check_estimator_sparse_data] #21515
  • 3.80s call linear_model/tests/test_quantile.py::test_asymmetric_error[0.5] #21546
  • 3.80s call experimental/tests/test_enable_successive_halving.py::test_imports_strategies cannot easily be optimized
  • 3.36s call ensemble/tests/test_gradient_boosting.py::test_regression_dataset[0.5-huber] #21984
  • 3.27s call feature_selection/tests/test_sequential.py::test_nan_support #21823
  • 3.06s call model_selection/tests/test_split.py::test_nested_cv #21551
  • 3.02s call feature_selection/tests/test_sequential.py::test_unsupervised_model_fit[4] https://github.com/scikit-learn/scikit-learn/pull/22045
  • 3.01s call decomposition/tests/test_kernel_pca.py::test_kernel_pca_solvers_equivalence[20] #21746
  • 2.97s call ensemble/tests/test_bagging.py::test_parallel_classification #21896

On another machine I found the following slow tests:

  • 30.13s call sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskElasticNetCV-threading] #21918
  • 21.48s call sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskLassoCV-threading] #21918
  • 9.89s call sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskElasticNetCV-loky] #21918
  • 8.05s call sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskLassoCV-loky] #21918

This list can probably be updated once the above have been dealt with.

Ideally, each of those should last less than 1s and even preferably less than 10ms if possible: we have more than 20,000 tests in scikit-learn so we strive to make our tests run as fast as possible while testing the interesting things so as to detect as many potential regressions as possible. We need to exercise judgement to strike a good balance between fast test execution speed (which is beneficial for the contribution workflow and ease of maintenance) and exhaustive enough coverage of nominal code paths and interesting edge cases.

The goal of this issue is to track the progress of writing individual PRs for each test (typically only changing only one file at a time) to rewrite them with the following in mind:

  • read and understand the purpose of the original test, possibly referring to the scikit-learn documentation when necessary;
  • try to tweak the tests (smaller dataset, different hyperparameters, different number of iterations…) to make the test run faster while preserving the original purpose of the test;
  • if you think it’s not possible to improve the speed of a given slow test in this list after analysis, please explain why in a comment on this issue;
  • if acceleration is possible, open a PR with the updated test and link to this issue in the PR description by stating Towards #21407.

Before pushing commits in a PR, please run the tests locally with the following command-line (for instance for the first test of this list):

pytest -v --durations=20 -k test_gridsearch ensemble/tests/test_voting.py

for parametrized tests, with [] and () in their name, pytest will refuse to select them as is. Instead you can use several expressions to select a specific parametrized test. For instance for the second test:

pytest -v --durations=20 sklearn/tests/test_common.py -k "test_estimators and SequentialFeatureSelector and LogisticRegression and check_estimator_sparse_data"

If this is the first time you contribute to scikit-learn, please have a look at the contributor’s guide first (in particular to learn how to build the main dev branch of scikit-learn from source and how to run the tests locally.

Note: in your PR, please report the test duration measured on your local machine before and after your changes.

Note 2: try to aim for low hanging fruits: some tests cannot be significantly be accelerated without changing the core intentions of the test. Other can be accelerated by a factor of 100x while preserving the core intention of the test. Do not waste too much time trying to prune less than 50% of the original runtime and try another test instead.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 27 (27 by maintainers)

Most upvoted comments

Thanks @norbusan. Then I think you should check the box above to avoid confusion for others. What do you think?

I’m going to work on another test acceleration. 3.36s call ensemble/tests/test_gradient_boosting.py::test_regression_dataset[0.5-huber]

Can I take this item?: 3.80s call experimental/tests/test_enable_successive_halving.py::test_imports_strategies

@ogrisel Are there concerns with using njobs=-1 when possible or has the joblib.Parallel backend context be set already when running CI? Or is the parallelism a concern for debugging? Thanks in advance.

Yes using n_jobs=-1 on an overloaded machine with many CPU cores will cause the creation of many Python subprocesses that will further bloat the machine.

The recommendation is to only use n_jobs=2 max in the scikit-learn test suite.