oneTBB: Some tests failed on MSVC 2019 (x64 build)

Release build:

F:\tmp\tbb\oneTBB-2021.3.0\build\2019.64>cmake -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_BUILD_TYPE=Releas
e -DCMAKE_CXX_STANDARD=17 -G "NMake Makefiles" ..\..
-- The CXX compiler identification is MSVC 19.20.27519.0
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/
Tools/MSVC/14.20.27508/bin/HostX64/x64/cl.exe
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/
Tools/MSVC/14.20.27508/bin/HostX64/x64/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - not found
-- Found Threads: TRUE
-- HWLOC target HWLOC::hwloc_1_11 doesn't exist. The tbbbind target cannot be created
-- HWLOC target HWLOC::hwloc_2 doesn't exist. The tbbbind_2_0 target cannot be created
-- HWLOC target HWLOC::hwloc_2_4 doesn't exist. The tbbbind_2_4 target cannot be created
-- The C compiler identification is MSVC 19.20.27519.0
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/To
ols/MSVC/14.20.27508/bin/HostX64/x64/cl.exe
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/To
ols/MSVC/14.20.27508/bin/HostX64/x64/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: F:/tmp/tbb/oneTBB-2021.3.0/build/2019.64

Tests:

The following tests FAILED:
         11 - test_partitioner (Failed)
         12 - test_parallel_for (Failed)
         14 - test_parallel_reduce (Failed)
         21 - test_concurrent_vector (Failed)
         63 - test_task (Failed)
         80 - conformance_parallel_for (Failed)
         82 - conformance_parallel_reduce (Failed)

RelWithDebInfo build:

F:\tmp\tbb\oneTBB-2021.3.0\build\2019.64rd>cmake -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_BUILD_TYPE=RelW
ithDebInfo -DCMAKE_CXX_STANDARD=17 -G "NMake Makefiles" ..\..
-- The CXX compiler identification is MSVC 19.20.27519.0
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/
Tools/MSVC/14.20.27508/bin/HostX64/x64/cl.exe
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/
Tools/MSVC/14.20.27508/bin/HostX64/x64/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - not found
-- Found Threads: TRUE
-- HWLOC target HWLOC::hwloc_1_11 doesn't exist. The tbbbind target cannot be created
-- HWLOC target HWLOC::hwloc_2 doesn't exist. The tbbbind_2_0 target cannot be created
-- HWLOC target HWLOC::hwloc_2_4 doesn't exist. The tbbbind_2_4 target cannot be created
-- The C compiler identification is MSVC 19.20.27519.0
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/To
ols/MSVC/14.20.27508/bin/HostX64/x64/cl.exe
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/To
ols/MSVC/14.20.27508/bin/HostX64/x64/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: F:/tmp/tbb/oneTBB-2021.3.0/build/2019.64rd

F:\tmp\tbb\oneTBB-2021.3.0\build\2019.64rd>

Tests:

The following tests FAILED:
         11 - test_partitioner (Failed)
         12 - test_parallel_for (Failed)
         14 - test_parallel_reduce (Failed)
         21 - test_concurrent_vector (Failed)
         36 - test_eh_flow_graph (Failed)
         63 - test_task (Failed)
         80 - conformance_parallel_for (Failed)
         82 - conformance_parallel_reduce (Failed)

LOG:

test 11
        Start  11: test_partitioner

11: Test command: "test_partitioner" "--forc
e-colors=1"
11: Test timeout computed to be: 10000000
11: Access violation
 11/132 Test  #11: test_partitioner .........................***Failed    0.04 sec
test 12
        Start  12: test_parallel_for

12: Test command: "test_parallel_for" "--for
ce-colors=1"
12: Test timeout computed to be: 10000000
12: Access violation
 12/132 Test  #12: test_parallel_for ........................***Failed    3.68 sec

test 14
        Start  14: test_parallel_reduce

14: Test command: "test_parallel_reduce" "--
force-colors=1"
14: Test timeout computed to be: 10000000
14: Access violation
 14/132 Test  #14: test_parallel_reduce .....................***Failed    0.20 sec


test 21
        Start  21: test_concurrent_vector

21: Test command: "test_concurrent_vector" "
--force-colors=1"
21: Test timeout computed to be: 10000000
21: Access violation
 21/132 Test  #21: test_concurrent_vector ...................***Failed    0.88 sec


test 36
        Start  36: test_eh_flow_graph

36: Test command: "test_eh_flow_graph" "--fo
rce-colors=1"
36: Test timeout computed to be: 10000000
36: [doctest] doctest version is "2.3.5"
36: [doctest] run with "--help" for options
36: ===============================================================================
36: F:\tmp\tbb\oneTBB-2021.3.0\test\tbb\test_eh_flow_graph.cpp(2029):
36: TEST CASE:  Testing several threads
36:
36: F:\tmp\tbb\oneTBB-2021.3.0\test\common/exception_handling.h(241): WARNING: WARN( n < c_Timeout )
 is NOT correct!
36:   values: WARN( 1000000 <  1000000 )
36:   logged: input_node(1): Missed wakeup or machine is overloaded?
36:
.......
36: F:\tmp\tbb\oneTBB-2021.3.0\test\common/exception_handling.h(241): WARNING: WARN( n < c_Timeout )
 is NOT correct!
36:   values: WARN( 1000000 <  1000000 )
36:   logged: input_node(1): Missed wakeup or machine is overloaded?
36:
36: F:\tmp\tbb\oneTBB-2021.3.0\test\common/exception_handling.h(241): WARNING: WARN( n < c_Timeout )
 is
 36/132 Test  #36: test_eh_flow_graph .......................***Failed   96.01 sec


test 63
        Start  63: test_task

63: Test command: "test_task" "--force-color
s=1"
63: Test timeout computed to be: 10000000
63: [doctest] doctest version is "2.3.5"
63: [doctest] run with "--help" for options
63: Access violation
 63/132 Test  #63: test_task ................................***Failed    0.24 sec


test 80
        Start  80: conformance_parallel_for

80: Test command: "conformance_parallel_for"
 "--force-colors=1"
80: Test timeout computed to be: 10000000
80: Access violation
 80/132 Test  #80: conformance_parallel_for .................***Failed    0.06 sec


test 82
        Start  82: conformance_parallel_reduce

82: Test command: "conformance_parallel_redu
ce" "--force-colors=1"
82: Test timeout computed to be: 10000000
82: Access violation
 82/132 Test  #82: conformance_parallel_reduce ..............***Failed    0.07 sec

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 49 (47 by maintainers)

Most upvoted comments

The fail is occurred in construction of proxy task task_dispatcher.cpp:L57 I looked at surrounding assembler, it looks strange:

000007FEEEA77CD8  call        tbb::detail::r1::allocate (07FEEEA41145h)
000007FEEEA77CDD  mov         r10,rax                     // rax contains allocated pointer and it is copied to r10
000007FEEEA77CE0  xorps       xmm0,xmm0                   // nullify xmm0
000007FEEEA77CE3  lea         r9,[r10+48h]  
000007FEEEA77CE7  xorps       xmm1,xmm1                   // nullify xmm1
000007FEEEA77CEA  movaps      xmmword ptr [rax+40h],xmm0  // why it nullifies the begining of proxy task? 
                                                          // it is not an issue but the compiler supposes rax+40h 
                                                          // to be aligned on 16 byte
000007FEEEA77CEE  movaps      xmmword ptr [rax+50h],xmm0  // nullify the next 16 bytes
000007FEEEA77CF2  xor         eax,eax  
000007FEEEA77CF4  movaps      xmmword ptr [r10+68h],xmm1  // it tries to nullify 16 bytes with 18h offset from 
                                                          // the previous access, so either rax+50h is not aligned 
                                                          // or r10+68h is not aligned... the code is broken...
000007FEEEA77CF9  mov         qword ptr [r10+78h],rax  
000007FEEEA77CFD  lea         rax,[tbb::detail::r1::task_proxy::`vftable' (07FEEEA899F0h)]  
000007FEEEA77D04  mov         qword ptr [r10+8],rbx  
000007FEEEA77D08  movups      xmmword ptr [r10+10h],xmm0  // but here, the compiler does not suppose
                                                          // the pointer to be aligned...
000007FEEEA77D0D  movups      xmmword ptr [r10+20h],xmm0  
000007FEEEA77D12  movups      xmmword ptr [r10+30h],xmm0  
000007FEEEA77D17  mov         qword ptr [r10],rax 

I looked at the code generated with msvc 19.28 - the compiler does not suppose any alignment:

00007FFE79C2BF71  call        tbb::detail::r1::allocate (07FFE79BF2DD8h)  
00007FFE79C2BF76  mov         r14,rax  
00007FFE79C2BF79  lea         rcx,[rsp+68h]  
00007FFE79C2BF7E  xorps       xmm0,xmm0  
00007FFE79C2BF81  lea         rsi,[r14+48h]  
00007FFE79C2BF85  xorps       xmm1,xmm1  
00007FFE79C2BF88  or          rbp,3  
00007FFE79C2BF8C  movups      xmmword ptr [rax+50h],xmm0  // unaligned access is used 
00007FFE79C2BF90  xor         eax,eax  
00007FFE79C2BF92  mov         qword ptr [rsp+68h],rbp  
00007FFE79C2BF97  movups      xmmword ptr [r14+68h],xmm1  // unaligned access is used 
00007FFE79C2BF9C  mov         qword ptr [r14+78h],rax  
00007FFE79C2BFA0  lea         rax,[tbb::detail::r1::task_proxy::`vftable' (07FFE79C3CEF8h)]  
00007FFE79C2BFA7  mov         qword ptr [r14+8],rbx  
00007FFE79C2BFAB  movups      xmmword ptr [r14+10h],xmm0  
00007FFE79C2BFB0  movups      xmmword ptr [r14+20h],xmm0  
00007FFE79C2BFB5  movups      xmmword ptr [r14+30h],xmm0  
00007FFE79C2BFBA  mov         qword ptr [r14],rax

It seems as an issue of msvc 19.20 that it uses aligned accesses where it cannot guarantee the required alignment. Looking at the source code, I could not find any UB that can cause abnormal behavior of the compiler.

@phprus, Do you agree with the analysis that msvc 19.20 generates broken code and it does not make sense to fix anything in that regard? (I am speaking only about the tests related to this issue, as for hangs, it seems another story)

I have a similar issue, but on a very different system: my test_eh_algorithms ‘hangs’ (well, it still shows high CPU usage, but doesn’t seem to progress) and also prints the logged: Missed wakeup or machine is overloaded?.

I’ve build tbb 2021.2.0 with GCC 10.3.0 on a CentOS 8.4 machine with two AMD EPYC 7H12 64-core processors. The funny thing is, the same cluster has a few RHEL 8.2 nodes as well, with two AMD EPYC 7F32 8-Core processors. Since the build was done on a shared filesystem, I figured I’d also run the make test on one of the RHEL nodes (i.e. the configure and make were run on a CentOS machine, the make test on a RHEL machine). Surprisingly enough, this make test on the RHEL node just worked without problems.

Now, I see that #553 is very specific to Windows, but seeing as the symptoms I’m facing are very similar: do you think the root cause could somehow be similar? And do you have any suggestion for how this could be fixed?

But why did this change solve the problem? Is there an undocumented API bug on Windows 7 related to thread stack size?

See #553. The root cause is that stack size is incorrectly calculated. In your environment it causes stack anchor overflow that leads to hangs.