scikit-learn: Accelerate slow examples
These examples take quite a long time to run, and they make our documentation CI fail quite frequently due to timeout. It’d be nice to speed the up a little bit.
To contributors: if you want to work on an example, first have a look at the example, and if you think you’re comfortable working on it and have found a potential way to speed-up execution time while preserving the educational message of the example, please mention which one you’re working on in the comments below.
Please open a dedicated PR for each individual example you have a found fix for (with a new git branch branched off of main
for each example) to make the review faster.
Please focus on the longest running examples first (e.g. 30s or more). Examples that run in less than 15s are probably fine.
Please also keep in mind that we want to keep the example code as simple as possible for educational reasons while preserving the main points expressed in the text of the example valid and well illustrated by the result of the execution (plots or text outputs).
Finally, we expect that some examples cannot really be accelerated while preserving their educational value (integrity of the message and the simplicity of the code). In this case, we might decide to keep them as they are if they last less than 60s.
To maintainers: I’m running a script which automatically updates the following list with connected PRs and “done” checkboxes, no need to updated them manually.
Examples to Update
- …/examples/linear_model/plot_poisson_regression_non_normal_loss.py: 60.41 sec #21787
- …/examples/impute/plot_missing_values.py: 26.37 sec #21792
- …/examples/miscellaneous/plot_johnson_lindenstrauss_bound.py: 19.42 sec #21795
- …/examples/linear_model/plot_sgd_early_stopping.py: 91.61 sec #21627
- …/examples/kernel_approximation/plot_scalable_poly_kernels.py: 42.52 sec #22903
- …/examples/ensemble/plot_stack_predictors.py: 32.45 sec #21726
- …/examples/decomposition/plot_image_denoising.py: 29.42 sec #21799
- …/examples/applications/plot_model_complexity_influence.py: 28.06 sec #21963
- …/examples/impute/plot_iterative_imputer_variants_comparison.py: 27.26 sec #21748
- …/examples/inspection/plot_partial_dependence.py: 21.99 sec #21768
- …/examples/neighbors/plot_nca_classification.py: 21.13 sec #21771
- …/examples/miscellaneous/plot_kernel_ridge_regression.py: 18.07 sec #21794 #21791
- …/examples/linear_model/plot_sparse_logistic_regression_20newsgroups.py: 18.05 sec #21773
- …/examples/neural_networks/plot_mnist_filters.py: 76.16 sec #21647
- …/examples/ensemble/plot_gradient_boosting_quantile.py: 60.39 sec #21666
- …/examples/semi_supervised/plot_semi_supervised_newsgroups.py: 55.99 sec #21673
- …/examples/ensemble/plot_gradient_boosting_early_stopping.py: 51.35 sec #21609
- …/examples/manifold/plot_lle_digits.py: 44.89 sec #21736
- …/examples/svm/plot_svm_scale_c.py: 40.61 sec #21625
- …/examples/cluster/plot_cluster_comparison.py: 39.24 sec #21624
- …/examples/compose/plot_digits_pipe.py: 37.29 sec #21728
- …/examples/model_selection/plot_multi_metric_evaluation.py: 32.78 sec #21626
- …/examples/ensemble/plot_gradient_boosting_regularization.py: 28.18 sec #21611
- …/examples/applications/plot_face_recognition.py: 24.58 sec #21725
- …/examples/linear_model/plot_sgd_comparison.py: 24.05 sec #21610
- …/examples/ensemble/plot_ensemble_oob.py: 20.69 sec #21730
- …/examples/feature_selection/plot_select_from_model_diabetes.py: 18.98 sec #21738
- …/examples/ensemble/plot_gradient_boosting_categorical.py: 18.68 sec #21634
- …/examples/manifold/plot_compare_methods.py: 14.77 sec #21635
- …/examples/model_selection/plot_successive_halving_iterations.py: 14.16 sec #21612
- …/examples/model_selection/plot_randomized_search.py: 253.02 sec #21637
- …/examples/model_selection/plot_permutation_tests_for_classification.py: 39.82 sec #21649
- …/examples/cluster/plot_digits_linkage.py: 39.15 sec #21678 #21737
- …/examples/neural_networks/plot_mlp_alpha.py: 34.27 sec #21648
- …/examples/preprocessing/plot_discretization_classification.py: 34.11 sec #21661
- …/examples/manifold/plot_t_sne_perplexity.py: 24.81 sec #21636
- …/examples/model_selection/plot_validation_curve.py: 15.32 sec #21638
- …/examples/ensemble/plot_adaboost_multiclass.py: 14.90 sec #21651
- …/examples/decomposition/plot_pca_vs_fa_model_selection.py: 12.14 sec #21671
- …/examples/cluster/plot_birch_vs_minibatchkmeans.py: 11.75 sec #21703
- …/examples/model_selection/plot_learning_curve.py: 10.50 sec #21628
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 63 (44 by maintainers)
Commits related to this issue
- accelerate plot_gradient_boosting_early_stopping.py example #21598 — committed to sply88/scikit-learn by sply88 3 years ago
- accelerate plot_gradient_boosting_regularization.py example #21598 — committed to sply88/scikit-learn by sply88 3 years ago
- accelerate plot_successive_halving_iterations.py example #21598 — committed to sply88/scikit-learn by sply88 3 years ago
- accelerate plot_randomized_search.py example #21598 — committed to sply88/scikit-learn by sply88 3 years ago
- DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to scikit-learn/scikit-learn by yarkhinephyo 3 years ago
- DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to glemaitre/scikit-learn by yarkhinephyo 3 years ago
- DOC accelerate plot_successive_halving_iterations.py example #21598 (#21612) * accelerate plot_successive_halving_iterations.py example #21598 * n_estimators back to 20 — committed to scikit-learn/scikit-learn by sply88 3 years ago
- DOC accelerate plot_gradient_boosting_regularization.py example #21598 (#21611) * accelerate plot_gradient_boosting_regularization.py example #21598 * speed up by less samples and less trees * ... — committed to scikit-learn/scikit-learn by sply88 3 years ago
- DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to glemaitre/scikit-learn by yarkhinephyo 3 years ago
- DOC accelerate plot_successive_halving_iterations.py example #21598 (#21612) * accelerate plot_successive_halving_iterations.py example #21598 * n_estimators back to 20 — committed to glemaitre/scikit-learn by sply88 3 years ago
- DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to samronsin/scikit-learn by yarkhinephyo 3 years ago
- DOC accelerate plot_successive_halving_iterations.py example #21598 (#21612) * accelerate plot_successive_halving_iterations.py example #21598 * n_estimators back to 20 — committed to samronsin/scikit-learn by sply88 3 years ago
- DOC accelerate plot_gradient_boosting_regularization.py example #21598 (#21611) * accelerate plot_gradient_boosting_regularization.py example #21598 * speed up by less samples and less trees * ... — committed to samronsin/scikit-learn by sply88 3 years ago
- DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to glemaitre/scikit-learn by yarkhinephyo 3 years ago
- DOC accelerate plot_successive_halving_iterations.py example #21598 (#21612) * accelerate plot_successive_halving_iterations.py example #21598 * n_estimators back to 20 — committed to glemaitre/scikit-learn by sply88 3 years ago
- DOC accelerate plot_gradient_boosting_regularization.py example #21598 (#21611) * accelerate plot_gradient_boosting_regularization.py example #21598 * speed up by less samples and less trees * ... — committed to glemaitre/scikit-learn by sply88 3 years ago
- DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to scikit-learn/scikit-learn by yarkhinephyo 3 years ago
- DOC accelerate plot_successive_halving_iterations.py example #21598 (#21612) * accelerate plot_successive_halving_iterations.py example #21598 * n_estimators back to 20 — committed to scikit-learn/scikit-learn by sply88 3 years ago
- DOC accelerate plot_gradient_boosting_regularization.py example #21598 (#21611) * accelerate plot_gradient_boosting_regularization.py example #21598 * speed up by less samples and less trees * ... — committed to scikit-learn/scikit-learn by sply88 3 years ago
- DOC Speed up plot_digits_linkage.py example #21598 (#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment... — committed to mathijs02/scikit-learn by yarkhinephyo 3 years ago
@hhnnhh or @marenwestermann may be interested in this.
@ogrisel done!
In general, what matters most is the quality of the pedagogical message. It always comes first and runtime is second (assuming it’s less than a few minutes). So if you are confident that you can craft a enlightening example that teaches the same concepts with a different dataset, why not. But in general I am not sure it’s easy nor worth it.
@norbusan and I are working on
../examples/ensemble/plot_stack_predictors.py
For instance, you can switch from the digits dataset to the iris dataset in the first and slowest example, and speed it up by almost 100 fold. The question is then if that still represents the benefit of
RandomizedSearchCV
. Or you could try to useHistGradientBoostingClassifier
instead ofSGDClassifier
and see if it works much faster. Then open a PR and through discussions we’ll figure out what the best choice is.@cakiki ideally you’d be able to speed them up by just changing some parameters or reducing the size of the data, while being able to present the same outcome, but changing the examples a bit is also not necessarily out of scope if it’s required.