oneTBB: The test_scheduler_mix hangs on !x86_64!

Commit: 58653a3729f343c48fecb4809a894cd4ba0b8574 gcc version 11.3.0 OS: openSUSE 15.5 CPU: Intel® Core™ i7-2600 CPU @ 3.40GHz Without virtualization.

Build commands:

CC=gcc-11 CXX=g++-11 cmake -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_STANDARD=20 -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON ../..
cmake --build . --verbose --config Release

Ctest:

ctest --timeout 180 --build-config Release -R test_scheduler_mix --repeat-until-fail 3600

For that environment, 3600 retries are enough to cause a hang.

Average passed test time is approximately 2 sec. Timeout is 180 sec.

Ctest output:

    Test #67: test_scheduler_mix ...............   Passed    1.50 sec
    Start 67: test_scheduler_mix
    Test #67: test_scheduler_mix ...............   Passed    2.30 sec
    Start 67: test_scheduler_mix
    Test #67: test_scheduler_mix ...............   Passed    1.84 sec
    Start 67: test_scheduler_mix
    Test #67: test_scheduler_mix ...............   Passed    1.86 sec
    Start 67: test_scheduler_mix
    Test #67: test_scheduler_mix ...............   Passed    1.78 sec
    Start 67: test_scheduler_mix
    Test #67: test_scheduler_mix ...............   Passed    1.57 sec
    Start 67: test_scheduler_mix
    Test #67: test_scheduler_mix ...............   Passed    1.99 sec
    Start 67: test_scheduler_mix
    Test #67: test_scheduler_mix ...............***Timeout 180.02 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 3755.73 sec

The following tests FAILED:
         67 - test_scheduler_mix (Timeout)
Errors while running CTest
Output from these tests are in: /home/phprus/devel/tmp/oneTBB/test_scheduler_mix/oneTBB-58653a3729f343c48fecb4809a894cd4ba0b8574/build/gcc-11-cxx20-r-lto/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

About this issue

  • Original URL
  • State: open
  • Created 7 months ago
  • Comments: 17 (17 by maintainers)

Most upvoted comments

Thank you, @dnmokhov , for the hang report after fixing the test.

It looks like another issue, the problem this time is in the scheduler. The hang occurs when a master thread occupies an arena slot for workers, and in the same time the thread unable to steal tasks due to stack size limit. If dependent tasks for the thread to wait were stolen and partially executed by other threads, the wait became infinite, as a worker thread unable to join the arena (worker slot is occupied by the master).

Thank you for submitting this. We have reproduced the hang and are looking into it.