transformers: [testing] the test suite is many times slower than 2 weeks ago

We are going to have a CI-side running reports when this is merged https://github.com/huggingface/transformers/pull/7884, but we can already start looking at what caused a 4-5 times slowdown in the test suite about 10 days ago. I’m not sure the exact moment, but I checked a few reports and it appears that the change happened around Oct 8th +/- a few days. e.g. before: https://app.circleci.com/pipelines/github/huggingface/transformers/13323/workflows/5984ea0e-e280-4a41-bc4a-b4a3d72fc411/jobs/95699 after: https://app.circleci.com/pipelines/github/huggingface/transformers/13521/workflows/d235c864-66fa-4408-a787-2efab850a781/jobs/97329

@sshleifer suggested a diagnostic to resolve this by adding a pytorch --durations=N flag, except if it’s a missing @slow it won’t work on my machine because I already have all the models pre-downloaded, so the following is just the slow execution:

Here is the report on my machine running all tests normally

$ pytest -n 3 --durations=0 tests
[...]
76.92s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_train_pipeline_custom_model
54.38s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_graph_mode
49.85s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_compile_tf_model
48.98s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_save_pretrained
44.11s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_compile_tf_model
38.42s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_graph_mode
35.94s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_tokenization_python_rust_equals
35.86s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_create_token_type_ids
35.81s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_embeded_special_tokens
35.58s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_max_length_equal
35.54s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_padding
35.36s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_is_fast
35.14s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_pretokenized_inputs
35.10s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_special_tokens_map_equal
35.07s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_num_special_tokens_to_add_equal
35.02s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_build_inputs_with_special_tokens
34.94s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_prepare_for_model
31.60s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_compile_tf_model
31.03s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_train_pipeline_custom_model
29.11s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_compile_tf_model
29.10s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_train_pipeline_custom_model
27.62s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_pt_tf_model_equivalence
26.36s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_compile_tf_model
25.12s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_save_pretrained
24.85s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_graph_mode
24.66s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_compile_tf_model
24.04s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_tokenization_python_rust_equals
23.15s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_pretrained
23.10s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_graph_mode
23.08s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_padding
22.99s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_compile_tf_model
22.78s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_train_pipeline_custom_model
22.69s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_keras_save_load
22.67s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_train_pipeline_custom_model
22.43s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_pretokenized_inputs
22.38s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_create_token_type_ids
22.35s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_prepare_for_model
22.28s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_attentions
22.25s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_max_length_equal
22.19s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_embeded_special_tokens
22.06s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_special_tokens_map_equal
21.95s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_is_fast
21.92s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_build_inputs_with_special_tokens
21.85s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_num_special_tokens_to_add_equal
21.61s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_graph_mode
21.49s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_add_special_tokens
21.32s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_add_tokens
21.21s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_alignement_methods
21.09s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_batch_encode_dynamic_overflowing
21.06s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_fast_only_inputs
20.95s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript
20.86s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_offsets_mapping
20.06s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_model_outputs_equivalence
20.01s call     tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_train_pipeline_custom_model
19.62s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_attention_outputs
19.39s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript_output_attentions
18.78s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_save_pretrained
18.63s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_train_pipeline_custom_model
18.36s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_compile_tf_model
18.08s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_tokenization_python_rust_equals
17.85s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_save_load
17.54s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_train_pipeline_custom_model
17.39s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_train_pipeline_custom_model
17.28s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_embeded_special_tokens
17.25s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_special_tokens_map_equal
16.88s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_train_pipeline_custom_model
16.84s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_graph_mode
16.74s call     tests/test_modeling_electra.py::ElectraModelTest::test_torchscript
16.73s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_padding
16.63s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_max_length_equal
16.56s call     tests/test_modeling_fsmt.py::FSMTModelTest::test_lm_head_model_random_beam_search_generate
16.55s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_hidden_state
16.53s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_build_inputs_with_special_tokens
16.53s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_is_fast
16.49s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_create_token_type_ids
16.45s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_num_special_tokens_to_add_equal
16.43s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_graph_mode
16.42s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_prepare_for_model
16.09s call     tests/test_tokenization_albert.py::AlbertTokenizationTest::test_pretokenized_inputs
16.02s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_model_outputs_equivalence
15.80s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_train_pipeline_custom_model
15.50s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_translation
15.30s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_train_pipeline_custom_model
15.00s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_graph_mode
14.96s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_compile_tf_model
14.07s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_train_pipeline_custom_model
14.03s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_pt_tf_model_equivalence
13.77s call     tests/test_modeling_rag.py::RagDPRT5Test::test_model_generate
13.29s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_hidden_states_output
13.10s call     tests/test_modeling_gpt2.py::GPT2ModelTest::test_model_outputs_equivalence
12.69s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_model_outputs_equivalence
12.43s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_save_pretrained
11.76s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_compile_tf_model
11.73s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_graph_mode
11.66s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_pt_tf_model_equivalence
11.63s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_special_tokens_map_equal
11.60s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_batch_encode_dynamic_overflowing
11.51s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_add_special_tokens
11.50s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_compile_tf_model
11.36s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_prepare_for_model
11.34s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_add_tokens
11.23s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_tokenization_python_rust_equals
11.19s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_fast_only_inputs
11.17s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_offsets_mapping
11.09s call     tests/test_benchmark_tf.py::TFBenchmarkTest::test_inference_no_configs_xla
11.05s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_alignement_methods
11.04s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_is_fast
10.95s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_offsets_with_special_characters
10.81s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_embeded_special_tokens
10.71s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_max_length_equal
10.59s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_build_inputs_with_special_tokens
10.59s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_num_special_tokens_to_add_equal
10.56s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_pt_tf_model_equivalence
10.42s call     tests/test_modeling_bert.py::BertModelTest::test_torchscript_output_hidden_state
10.39s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_create_token_type_ids
10.36s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_compile_tf_model
10.34s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_attention_outputs
10.31s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_model_outputs_equivalence
10.25s call     tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript
10.15s call     tests/test_modeling_bert.py::BertModelTest::test_torchscript_output_attentions
9.78s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_pt_tf_model_equivalence
9.76s call     tests/test_benchmark_tf.py::TFBenchmarkTest::test_train_with_configs
9.76s call     tests/test_modeling_bert.py::BertModelTest::test_model_outputs_equivalence
9.72s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript
9.50s call     tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript_output_hidden_state
9.37s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_pt_tf_model_equivalence
9.31s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_batch_encode_dynamic_overflowing
9.31s call     tests/test_modeling_albert.py::AlbertModelTest::test_model_outputs_equivalence
9.30s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_save_load
9.10s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_model_outputs_equivalence
9.01s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_keras_save_load
8.88s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_add_tokens
8.81s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_offsets_mapping
8.80s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_keras_save_load
8.73s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_add_special_tokens
8.66s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_fast_only_inputs
8.60s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_pt_tf_model_equivalence
8.59s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_alignement_methods
8.57s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_model_outputs_equivalence
8.50s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_compile_tf_model
8.34s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_summarization
8.33s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_lm_head_model_random_beam_search_generate
8.17s call     tests/test_modeling_albert.py::AlbertModelTest::test_torchscript
8.01s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_model_outputs_equivalence
7.96s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_model_outputs_equivalence
7.88s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_attention_outputs
7.87s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_graph_mode
7.84s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_attention_outputs
7.84s call     tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript
7.58s call     tests/test_modeling_albert.py::AlbertModelTest::test_torchscript_output_attentions
7.57s call     tests/test_modeling_tf_transfo_xl.py::TFTransfoXLModelTest::test_train_pipeline_custom_model
7.51s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_save_load
7.47s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_attention_outputs
7.46s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_keyword_and_dict_args
7.36s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_save_load
7.36s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_determinism
7.32s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_graph_mode
7.23s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_attention_outputs
7.16s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_text_generation
7.05s call     tests/test_modeling_electra.py::ElectraModelTest::test_torchscript_output_attentions
7.04s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_resize_token_embeddings
7.02s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_lm_head_model_random_beam_search_generate
6.90s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_train_pipeline_custom_model
6.88s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_hidden_states_output
6.82s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_save_load
6.73s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_lm_head_model_random_beam_search_generate
6.70s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_model_outputs_equivalence
6.60s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_save_load
6.57s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_keras_save_load
6.48s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_graph_mode
6.47s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_attention_outputs
6.44s call     tests/test_modeling_encoder_decoder.py::GPT2EncoderDecoderModelTest::test_encoder_decoder_model_generate
6.44s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_inputs_embeds
6.35s call     tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_compile_tf_model
6.25s call     tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_compile_tf_model
6.25s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_lm_head_model_random_beam_search_generate
6.11s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_loss_computation
6.05s call     tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript_output_attentions
5.98s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_save_load
5.93s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_train_pipeline_custom_model
5.77s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_compile_tf_model
5.72s call     tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript_output_attentions
5.69s call     tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_torchscript_output_hidden_state
5.65s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_fast_only_inputs
5.64s call     tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_multigpu_data_parallel_forward
5.60s call     tests/test_modeling_blenderbot.py::Blenderbot90MIntegrationTests::test_90_generation_from_short_input
5.58s call     tests/test_modeling_electra.py::ElectraModelTest::test_model_outputs_equivalence
5.57s call     tests/test_modeling_rag.py::RagDPRBartTest::test_model_with_encoder_outputs
5.56s call     tests/test_modeling_bart.py::BARTModelTest::test_torchscript_output_attentions
5.54s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_attention_outputs
5.54s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_padding
5.51s call     tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_torchscript_output_attentions
5.46s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_alignement_methods
5.46s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_offsets_mapping
5.44s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_add_special_tokens
5.40s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_add_tokens
5.33s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_keras_save_load
5.30s call     tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript_output_hidden_state
5.27s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_graph_mode
5.22s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_hidden_states_output
5.20s call     tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_graph_mode
5.12s call     tests/test_pipelines.py::NerPipelineTests::test_tf_only_ner
5.11s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript_output_hidden_state
5.10s call     tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript_output_attentions
5.09s call     tests/test_modeling_electra.py::ElectraModelTest::test_torchscript_output_hidden_state
5.07s call     tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_graph_mode
5.05s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_pair_input
5.04s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_graph_mode
5.01s call     tests/test_modeling_distilbert.py::DistilBertModelTest::test_torchscript_output_hidden_state
4.99s call     tests/test_modeling_fsmt.py::FSMTHeadTests::test_generate_fp16
4.94s call     tests/test_modeling_distilbert.py::DistilBertModelTest::test_model_outputs_equivalence
4.93s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_hidden_states_output
4.90s call     tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript_output_hidden_state
4.85s call     tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript
4.84s call     tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_train_pipeline_custom_model
4.82s call     tests/test_pipelines.py::QAPipelineTests::test_tf_question_answering
4.81s call     tests/test_modeling_encoder_decoder.py::BertEncoderDecoderModelTest::test_save_and_load_from_encoder_decoder_pretrained
4.78s call     tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript_output_hidden_state
4.76s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_single_input
4.73s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_pretokenized_inputs
4.72s call     tests/test_pipelines.py::QAPipelineTests::test_torch_question_answering
4.68s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_pt_tf_model_equivalence
4.63s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_add_special_tokens
4.60s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_save_load
4.59s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_model_outputs_equivalence
4.59s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_identifier_non_existent
4.57s call     tests/test_benchmark_tf.py::TFBenchmarkTest::test_inference_encoder_decoder_with_configs
4.56s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_hidden_states_output
4.51s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_text2text
4.48s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_keras_save_load
4.48s call     tests/test_tokenization_marian.py::MarianTokenizationTest::test_tokenizer_equivalence_en_de
4.47s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_keras_save_load
4.44s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_attention_outputs
4.43s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_hidden_states_output
4.41s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_keyword_and_dict_args
4.39s call     tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_pipeline
4.34s call     tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_save_load
4.33s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_hidden_states_output
4.28s call     tests/test_tokenization_fsmt.py::FSMTTokenizationTest::test_pickle_tokenizer
4.28s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_text_generation
4.27s call     tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript_output_attentions
4.16s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_resize_token_embeddings
4.12s call     tests/test_modeling_tf_t5.py::TFT5ModelTest::test_compile_tf_model
4.10s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_keras_save_load
4.09s call     tests/test_modeling_tf_t5.py::TFT5ModelTest::test_train_pipeline_custom_model
4.09s call     tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_save_load
4.08s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_head_pruning_integration
4.07s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_keras_save_load
4.06s call     tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript_output_attentions
4.04s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_fill_mask
3.96s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_determinism
3.96s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_pt_tf_model_equivalence
3.95s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_keras_save_load
3.95s setup    tests/test_modeling_marian.py::TestMarian_FR_EN::test_batch_generation_fr_en
3.93s call     tests/test_modeling_reformer.py::ReformerLocalAttnModelTest::test_model_outputs_equivalence
3.88s call     tests/test_pipelines.py::NerPipelineTests::test_ner_grouped
3.87s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_determinism
3.86s call     tests/test_pipelines.py::NerPipelineTests::test_torch_ner
3.86s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_feature_extraction
3.85s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_sentiment_analysis
3.82s call     tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript
3.75s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_loss_computation
3.75s call     tests/test_modeling_t5.py::T5ModelTest::test_export_to_onnx
3.73s call     tests/test_modeling_tf_t5.py::TFT5ModelTest::test_lm_head_model_random_beam_search_generate
3.72s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_keras_save_load
3.70s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_fill_mask_with_targets
3.69s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_hidden_states_output
3.64s call     tests/test_modeling_bart.py::BARTModelTest::test_torchscript_output_hidden_state
3.62s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_fill_mask_with_targets
3.60s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_feature_extraction
3.60s call     tests/test_pipelines.py::NerPipelineTests::test_tf_ner
3.60s call     tests/test_pipelines.py::ZeroShotClassificationPipelineTests::test_torch_zero_shot_classification
3.59s call     tests/test_modeling_bert.py::BertModelTest::test_torchscript
3.58s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_sentiment_analysis
3.57s setup    tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_auto_config
3.53s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_resize_token_embeddings
3.50s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_fill_mask
3.50s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_keras_save_load
3.50s call     tests/test_modeling_dpr.py::DPRModelTest::test_torchscript
3.46s call     tests/test_modeling_bart.py::BARTModelTest::test_tiny_model
3.46s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_inputs_embeds
3.44s call     tests/test_modeling_albert.py::AlbertModelTest::test_torchscript_output_hidden_state
3.42s call     tests/test_modeling_xlnet.py::XLNetModelTest::test_model_outputs_equivalence
3.42s call     tests/test_pipelines.py::ZeroShotClassificationPipelineTests::test_tf_zero_shot_classification
3.42s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_pt_tf_model_equivalence
3.40s setup    tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_batch_generation_en_ROMANCE_multi
3.40s call     tests/test_tokenization_albert.py::AlbertTokenizationTest::test_maximum_encoding_length_pair_input
3.40s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_model_outputs_equivalence
3.39s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_keyword_and_dict_args
3.36s call     tests/test_modeling_dpr.py::DPRModelTest::test_model_outputs_equivalence
3.35s setup    tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_batch_generation_en_de
3.33s call     tests/test_modeling_bart.py::BARTModelTest::test_model_outputs_equivalence
3.32s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_determinism
3.32s call     tests/test_pipelines.py::NerPipelineTests::test_tf_ner_grouped
3.31s setup    tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_tokenizer_handles_empty
3.30s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_resize_token_embeddings
3.29s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_pt_tf_model_equivalence
3.28s call     tests/test_modeling_bert.py::BertModelTest::test_head_pruning_integration
3.28s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_model_outputs_equivalence
3.27s call     tests/test_modeling_dpr.py::DPRModelTest::test_torchscript_output_attentions
3.26s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_model_outputs_equivalence
3.23s setup    tests/test_modeling_marian.py::TestMarian_en_zh::test_batch_generation_eng_zho
3.23s setup    tests/test_modeling_marian.py::TestMarian_EN_FR::test_batch_generation_en_fr
3.22s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript
3.20s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_attention_outputs
3.20s setup    tests/test_modeling_marian.py::TestMarian_RU_FR::test_batch_generation_ru_fr
3.19s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_lm_head_model_random_no_beam_search_generate
3.18s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_keras_save_load
3.17s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_pt_tf_model_equivalence
3.15s call     tests/test_modeling_tf_t5.py::TFT5ModelTest::test_graph_mode
3.13s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_pt_tf_model_equivalence
3.13s setup    tests/test_modeling_marian.py::TestMarian_MT_EN::test_batch_generation_mt_en
3.12s call     tests/test_modeling_tf_transfo_xl.py::TFTransfoXLModelTest::test_compile_tf_model
3.12s setup    tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_forward
3.11s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_model_outputs_equivalence
3.10s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_keyword_and_dict_args

I made a 3-sec cut-off for this listing.

@sshleifer, @sgugger, @LysandreJik, @thomwolf

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 21 (20 by maintainers)

Most upvoted comments

I would change all things that need to do a training to all thing that need to do a real training. I spent a lot of time making a mock training fast for the tests of the Trainer and I don’t want that marked as slow 😉

sgugger on Oct 19, 2020

Wrote a one liner to calculate the sub-totals for whatever pattern in the output of pytest --durations=0 stats, as in:

22.15s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_pretrained
4.42s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_identifier_non_existent
2.75s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_model_type
2.57s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_tokenizer_class

Total runtime:

$ cat stats.txt | perl -ne 's|^(.*?)s.|$x+=$1|e; END {print int $x}'
3308

Total tf runtime:

grep _tf_ stats.txt | perl -ne 's|^(.*?)s.|$x+=$1|e; END {print int $x}'
1609

stas00 on Oct 19, 2020

I’m not sure setting up a 5/15 or any specific time requirement on tests to classify them as slow would be best. Some tests, like the test_model_outputs_equivalence are important, and running them on contributors’ PR when their changes affect the modeling internals is too.

I think the following proposition would be more suited:

if the test is focused on one of the library’s internal components (e.g., modeling files, tokenization files, pipelines), then we should run that test in the non-slow test suite. If it’s focused on an other aspect of the library, such as the documentation, the examples, then we should run these tests in the slow test suite. And then, to refine this approach we should have exceptions:

All tests that need a specific set of weights (e.g., model or tokenizer integration tests, pipeline integration tests) should be set to slow.
All tests that need to do a training (e.g, trainer integration tests) should be set to slow.
We can introduce exceptions if some of these should-be-non-slow tests are excruciatingly long, and set them to slow. Some examples are some auto modeling tests, which save and load large files to disk, which are set to slow.
Others?

To that end, we should aim for all the non-slow tests to cover entirely the different internals, while making sure that the tests keep a fast execution time. Having some very small models in the tests (e.g, 2 layers, 10 vocab size, etc.) helps in that regard, as does having dummy sets of weights like the sshleifer/tiny-xxx-random weights. On that front, there seems to be something fishy going on with the MobileBERT model, as it’s supposed to be an efficient model but takes a while to be tested. There’s probably something to do for this model.

Willing to iterate on this wording, or specify/change some aspects if you think of something better.

Following this approach:

For the tokenization_auto tests, we can definitely uncomment the @slow.

For the MonoColumnInputTestCase, we can also set it as a slow test.

LysandreJik on Oct 19, 2020

https://github.com/huggingface/transformers/pull/7659 may have fixed the slow tokenization tests. Checking the most recent run it’s back to ~2min for the torch-only job. https://app.circleci.com/pipelines/github/huggingface/transformers/13951/workflows/244749ce-d1ee-488f-a59d-d891fbc38ed6/jobs/100800 I will check a few more and close it if that was the culprit.

stas00 on Oct 18, 2020

Do you want to try to fix test_tokenization_fast

I will give it a go.

edit: Except it was just removed by the merge that just happened. So I have to start from scratch.

stas00 on Oct 18, 2020