scikit-learn: Run more examples that do not start with plot_ on CircleCI
From https://github.com/scikit-learn/scikit-learn/pull/8847#issuecomment-300015905:
and we should have a CI test for non-plotted examples or convert as many as possible to plots
My proposal is to have a convention like run_
for examples that do not produce any plots. sphinx-gallery allows to have a regex to specify which examples you want to run. It could be something like plot_|run_
. See the doc for more details.
I looked at the examples whose filename is not starting with plot_
. Timings are in seconds and in increasing order.
examples/feature_selection/feature_selection_pipeline.py 1.39
examples/exercises/digits_classification_exercise.py 1.47
examples/applications/svm_gui.py 1.86
examples/missing_values.py 2.01
examples/model_selection/randomized_search.py 2.02
examples/feature_stacker.py 2.14
examples/text/document_clustering.py 3.21
examples/linear_model/lasso_dense_vs_sparse_data.py 3.98
examples/text/hashing_vs_dict_vectorizer.py 4.78
examples/model_selection/grid_search_digits.py 8.29
examples/text/document_classification_20newsgroups.py 8.93
examples/applications/topics_extraction_with_nmf_lda.py 10.53
examples/applications/face_recognition.py 25.02
examples/bicluster/bicluster_newsgroups.py 25.72
examples/hetero_feature_union.py 116.22
examples/applications/wikipedia_principal_eigenvector.py 139.77
examples/model_selection/grid_search_text_feature_extraction.py 156.86
With this in mind I would be in favour of running all the examples but svm_gui.py
and the last three examples.
More details:
svm_gui.py
pops up a gui so it should probably not be run. Whether we should run wikipedia_principal_eigenvector.py
and grid_search_text_feature_extraction.py
which each takes more than 2 minutes is up for debate. On top of that, some of them may require data download that is not using the typical ~/scikit_learn_data
(e.g. the Wikipedia one). If that is the case these examples would not benefit from the CircleCI cache.
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 1
- Comments: 17 (17 by maintainers)
There are still some examples that we don’t run (i.e. that don’t start with
plot_
):python-annoy
andnmslib
). This was part of https://github.com/scikit-learn/scikit-learn/pull/10482 if more context is needed. This example takes ~3.5 minutes on my machine so maybe a bit too long to run in the CI …For more context why this matters (at least a little bit):
Often, examples are not named “plot_*” because they take a long time to run, or require a large download. Back when we create them, we considered that we did not have enough horsepower with the CI to run them. Maybe we should indeed reconsider this decision, but first we need to evaluate our computing power in the CI.