scikit-learn: Accelerate slow examples

These examples take quite a long time to run, and they make our documentation CI fail quite frequently due to timeout. It’d be nice to speed the up a little bit.

To contributors: if you want to work on an example, first have a look at the example, and if you think you’re comfortable working on it and have found a potential way to speed-up execution time while preserving the educational message of the example, please mention which one you’re working on in the comments below.

Please open a dedicated PR for each individual example you have a found fix for (with a new git branch branched off of main for each example) to make the review faster.

Please focus on the longest running examples first (e.g. 30s or more). Examples that run in less than 15s are probably fine.

Please also keep in mind that we want to keep the example code as simple as possible for educational reasons while preserving the main points expressed in the text of the example valid and well illustrated by the result of the execution (plots or text outputs).

Finally, we expect that some examples cannot really be accelerated while preserving their educational value (integrity of the message and the simplicity of the code). In this case, we might decide to keep them as they are if they last less than 60s.

To maintainers: I’m running a script which automatically updates the following list with connected PRs and “done” checkboxes, no need to updated them manually.

Examples to Update

…/examples/linear_model/plot_poisson_regression_non_normal_loss.py: 60.41 sec #21787
…/examples/impute/plot_missing_values.py: 26.37 sec #21792
…/examples/miscellaneous/plot_johnson_lindenstrauss_bound.py: 19.42 sec #21795
…/examples/linear_model/plot_sgd_early_stopping.py: 91.61 sec #21627
…/examples/kernel_approximation/plot_scalable_poly_kernels.py: 42.52 sec #22903
…/examples/ensemble/plot_stack_predictors.py: 32.45 sec #21726
…/examples/decomposition/plot_image_denoising.py: 29.42 sec #21799
…/examples/applications/plot_model_complexity_influence.py: 28.06 sec #21963
…/examples/impute/plot_iterative_imputer_variants_comparison.py: 27.26 sec #21748
…/examples/inspection/plot_partial_dependence.py: 21.99 sec #21768
…/examples/neighbors/plot_nca_classification.py: 21.13 sec #21771
…/examples/miscellaneous/plot_kernel_ridge_regression.py: 18.07 sec #21794 #21791
…/examples/linear_model/plot_sparse_logistic_regression_20newsgroups.py: 18.05 sec #21773
…/examples/neural_networks/plot_mnist_filters.py: 76.16 sec #21647
…/examples/ensemble/plot_gradient_boosting_quantile.py: 60.39 sec #21666
…/examples/semi_supervised/plot_semi_supervised_newsgroups.py: 55.99 sec #21673
…/examples/ensemble/plot_gradient_boosting_early_stopping.py: 51.35 sec #21609
…/examples/manifold/plot_lle_digits.py: 44.89 sec #21736
…/examples/svm/plot_svm_scale_c.py: 40.61 sec #21625
…/examples/cluster/plot_cluster_comparison.py: 39.24 sec #21624
…/examples/compose/plot_digits_pipe.py: 37.29 sec #21728
…/examples/model_selection/plot_multi_metric_evaluation.py: 32.78 sec #21626
…/examples/ensemble/plot_gradient_boosting_regularization.py: 28.18 sec #21611
…/examples/applications/plot_face_recognition.py: 24.58 sec #21725
…/examples/linear_model/plot_sgd_comparison.py: 24.05 sec #21610
…/examples/ensemble/plot_ensemble_oob.py: 20.69 sec #21730
…/examples/feature_selection/plot_select_from_model_diabetes.py: 18.98 sec #21738
…/examples/ensemble/plot_gradient_boosting_categorical.py: 18.68 sec #21634
…/examples/manifold/plot_compare_methods.py: 14.77 sec #21635
…/examples/model_selection/plot_successive_halving_iterations.py: 14.16 sec #21612
…/examples/model_selection/plot_randomized_search.py: 253.02 sec #21637
…/examples/model_selection/plot_permutation_tests_for_classification.py: 39.82 sec #21649
…/examples/cluster/plot_digits_linkage.py: 39.15 sec #21678 #21737
…/examples/neural_networks/plot_mlp_alpha.py: 34.27 sec #21648
…/examples/preprocessing/plot_discretization_classification.py: 34.11 sec #21661
…/examples/manifold/plot_t_sne_perplexity.py: 24.81 sec #21636
…/examples/model_selection/plot_validation_curve.py: 15.32 sec #21638
…/examples/ensemble/plot_adaboost_multiclass.py: 14.90 sec #21651
…/examples/decomposition/plot_pca_vs_fa_model_selection.py: 12.14 sec #21671
…/examples/cluster/plot_birch_vs_minibatchkmeans.py: 11.75 sec #21703
…/examples/model_selection/plot_learning_curve.py: 10.50 sec #21628

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 1
Comments: 63 (44 by maintainers)

Commits related to this issue

accelerate plot_gradient_boosting_early_stopping.py example #21598 — committed to sply88/scikit-learn by sply88 3 years ago
accelerate plot_gradient_boosting_regularization.py example #21598 — committed to sply88/scikit-learn by sply88 3 years ago
accelerate plot_successive_halving_iterations.py example #21598 — committed to sply88/scikit-learn by sply88 3 years ago
accelerate plot_randomized_search.py example #21598 — committed to sply88/scikit-learn by sply88 3 years ago
DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to scikit-learn/scikit-learn by yarkhinephyo 3 years ago
DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to glemaitre/scikit-learn by yarkhinephyo 3 years ago
DOC accelerate plot_successive_halving_iterations.py example #21598 (#21612) * accelerate plot_successive_halving_iterations.py example #21598 * n_estimators back to 20 — committed to scikit-learn/scikit-learn by sply88 3 years ago
DOC accelerate plot_gradient_boosting_regularization.py example #21598 (#21611) * accelerate plot_gradient_boosting_regularization.py example #21598 * speed up by less samples and less trees * ... — committed to scikit-learn/scikit-learn by sply88 3 years ago
DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to glemaitre/scikit-learn by yarkhinephyo 3 years ago
DOC accelerate plot_successive_halving_iterations.py example #21598 (#21612) * accelerate plot_successive_halving_iterations.py example #21598 * n_estimators back to 20 — committed to glemaitre/scikit-learn by sply88 3 years ago
DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to samronsin/scikit-learn by yarkhinephyo 3 years ago
DOC accelerate plot_successive_halving_iterations.py example #21598 (#21612) * accelerate plot_successive_halving_iterations.py example #21598 * n_estimators back to 20 — committed to samronsin/scikit-learn by sply88 3 years ago
DOC accelerate plot_gradient_boosting_regularization.py example #21598 (#21611) * accelerate plot_gradient_boosting_regularization.py example #21598 * speed up by less samples and less trees * ... — committed to samronsin/scikit-learn by sply88 3 years ago
DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to glemaitre/scikit-learn by yarkhinephyo 3 years ago
DOC accelerate plot_successive_halving_iterations.py example #21598 (#21612) * accelerate plot_successive_halving_iterations.py example #21598 * n_estimators back to 20 — committed to glemaitre/scikit-learn by sply88 3 years ago
DOC accelerate plot_gradient_boosting_regularization.py example #21598 (#21611) * accelerate plot_gradient_boosting_regularization.py example #21598 * speed up by less samples and less trees * ... — committed to glemaitre/scikit-learn by sply88 3 years ago
DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to scikit-learn/scikit-learn by yarkhinephyo 3 years ago
DOC accelerate plot_successive_halving_iterations.py example #21598 (#21612) * accelerate plot_successive_halving_iterations.py example #21598 * n_estimators back to 20 — committed to scikit-learn/scikit-learn by sply88 3 years ago
DOC accelerate plot_gradient_boosting_regularization.py example #21598 (#21611) * accelerate plot_gradient_boosting_regularization.py example #21598 * speed up by less samples and less trees * ... — committed to scikit-learn/scikit-learn by sply88 3 years ago
DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to mathijs02/scikit-learn by yarkhinephyo 3 years ago

Most upvoted comments

@hhnnhh or @marenwestermann may be interested in this.

adrinjalali on Nov 8, 2021

Could you please change your script to remove examples that run in less than 20s or 15s from the list to avoid giving incentives of opening PRs with smaller added-value / review time ratios?

@ogrisel done!

adrinjalali on Nov 22, 2021

I’ll try …/examples/feature_selection/plot_select_from_model_diabetes.py, Is it okay to change the example - different dataset in order to achieve a speedup?

In general, what matters most is the quality of the pedagogical message. It always comes first and runtime is second (assuming it’s less than a few minutes). So if you are confident that you can craft a enlightening example that teaches the same concepts with a different dataset, why not. But in general I am not sure it’s easy nor worth it.

ogrisel on Nov 22, 2021

@norbusan and I are working on ../examples/ensemble/plot_stack_predictors.py

chritter on Nov 21, 2021

For instance, you can switch from the digits dataset to the iris dataset in the first and slowest example, and speed it up by almost 100 fold. The question is then if that still represents the benefit of RandomizedSearchCV. Or you could try to use HistGradientBoostingClassifier instead of SGDClassifier and see if it works much faster. Then open a PR and through discussions we’ll figure out what the best choice is.

adrinjalali on Nov 9, 2021

@cakiki ideally you’d be able to speed them up by just changing some parameters or reducing the size of the data, while being able to present the same outcome, but changing the examples a bit is also not necessarily out of scope if it’s required.

adrinjalali on Nov 9, 2021