alpaka: math test failed with Clang as CUDA compiler

cmake -DBoost_USE_STATIC_LIBS=ON -DBoost_USE_MULTITHREADED=ON -DBoost_USE_STATIC_RUNTIME=OFF -DCMAKE_BUILD_TYPE=Release -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLE=ON -DALPAKA_ACC_CPU_B_SEQ_T_THREADS_ENABLE=OFF -DALPAKA_ACC_CPU_B_SEQ_T_FIBERS_ENABLE=OFF -DALPAKA_ACC_CPU_B_TBB_T_SEQ_ENABLE=OFF -DALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLE=OFF -DALPAKA_ACC_CPU_B_SEQ_T_OMP2_ENABLE=OFF -DALPAKA_ACC_CPU_BT_OMP4_ENABLE=OFF -DALPAKA_ACC_GPU_CUDA_ENABLE=ON -DALPAKA_ACC_GPU_HIP_ENABLE=OFF -DALPAKA_DEBUG=0 -DALPAKA_CUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -Dalpaka_BUILD_EXAMPLES=ON -DBUILD_TESTING=ON ..
build/test/unit/math/math
using seed: 1337

testing:
 3 - accelerators !
17 - unary math operators
6 - binary math operators
testing with two data types
total 2 * accelerators * (unary + binary) * capacity


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
math is a Catch v2.11.0 host application.
Run with -? for options

-------------------------------------------------------------------------------
mathOps
-------------------------------------------------------------------------------
/alpaka/test/unit/math/src/math.cpp:173
...............................................................................

/alpaka/test/unit/math/src/math.cpp:144: FAILED:
  REQUIRE( results(i) == Approx(std_result) )
with expansion:
  -inf == Approx( inf )
with messages:
  Operator: OpExp
  Type: d
  The args buffer: 
  capacity: 1000
  0: [ 0, ]
  1: [ 1.797693134862316e+308, ]
  2: [ -1.797693134862316e+308, ]
  3: [ -866.0227473557505, ]
  4: [ 215.2263814000266, ]
  5: [ -748.4321206839105, ]
  6: [ 642.1264531942572, ]
  7: [ -684.4018728416871, ]
  8: [ 459.689454949301, ]
  9: [ -222.9117748544045, ]
  10: [ 209.1584889667682, ]
  11: [ -292.4749641267939, ]
  12: [ 368.071562313879, ]
  13: [ -215.1164257009424, ]
  14: [ 993.5934513103955, ]
  15: [ -503.7756326294896, ]
  16: [ 289.652593683621, ]
  17: [ -491.8247722903218, ]
  18: [ 58.77905907878117, ]
  19: [ -460.0548273044462, ]
  20: [ 849.1603358834052, ]
  21: [ -320.8197171238272, ]
  22: [ 457.8027599177277, ]
  23: [ -669.7326682649812, ]
  24: [ 236.7513732381438, ]
  25: [ -181.0255020122907, ]
  26: [ 853.430905473642, ]
  27: [ -482.0493555522171, ]
  28: [ 727.3698115061305, ]
# ...

Tested on fwk394 with CUDA 10.1, Clang 10.0, CMake 3.16.5 and Boost 1.73.0 via Spack. It also fails with the Alpaka-CI Docker image (the image is not public available at the moment -> I work on it).

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 26 (26 by maintainers)

Commits related to this issue

Most upvoted comments

When the tests mentioned in my message above (clang-CUDA, no fast-math) fail, it’s because two very small values are compared, or one very small and one 0. I feel we use a potentially weird way of comparing the numbers, that would often fail in such a situation.

To give an example, sometimes the double precision pow(343.1018, -16.14939) gives 1.13351e-41 in alpaka and 1.13365e-41 in the standard library and so the test fails, I explain why below. I am not sure why it is not fully consistent, but given each time it fails (there is also a similarly inconsistent case of exp(large_number, negative_large_number)) there is a nearly correct result, not some obviously garbage data, I’m inclined to think it’s not a data race issue. But just what alpaka actually returns as the backend implementation returns it.

We do the pattern of alpaka_result == Approx(std_library_result). Approx is from catch2, I’ve looked at its code and it means that the check is true if any of the following is true:

  • The absolute difference between the two numbers is within the margin given to Approx. We never set a margin, so it is 0 by default and this check just boils down to alpaka_result == std_library_result in our case. So in our current usage this case does just nothing, as it is a subset of the next check.
  • The relative difference between the two numbers is within the given epsilon (there is actually another parameter to make this check more generic, but how we use it, that’s just relative difference). We never set an epsilon, so it uses its default value, which seems reasonable to me. For the case of small enough numbers, that will also fail unless they are exactly equal. As can be seen for that failing example.

We can try to add a small non-zero margin so that the first check will pass for the case of two nearly, but not exactly equal, small numbers. It is difficult to make a reasonable one though, as then the test becomes too imprecise for some cases.

I used the fix above and tested with clang++ 12 CUDA 10.1 and all math tests passed, repeated the test >20 times.

btw: fast-math is in 0.7.0+ by default disabled.

Nope, the test is still failing. I updated the container (registry.gitlab.com/hzdr/crp/alpaka-group-container/alpaka-ci-cuda101-clang:1.3) and the cmake configure command:

cmake -DBOOST_ROOT=/opt/boost/1.75.0/ -DBOOST_LIBRARYDIR="/opt/boost/1.75.0/lib" -DBoost_USE_STATIC_LIBS=ON -DBoost_USE_MULTITHREADED=ON -DBoost_USE_STATIC_RUNTIME=OFF -DCMAKE_BUILD_TYPE=Release -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLE=ON -DALPAKA_ACC_GPU_CUDA_ENABLE=ON -DALPAKA_DEBUG=0 -DALPAKA_CUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++-11 -Dalpaka_BUILD_EXAMPLES=ON -DBUILD_TESTING=ON ..

The container is online. You can simply reproduce the problem with the following script:

docker run --runtime=nvidia -it registry.gitlab.com/hzdr/crp/alpaka-group-container/alpaka-ci:cuda10.1Clang
export CUDA_VISIBLE_DEVICES="1"
git clone https://github.com/alpaka-group/alpaka.git
mkdir alpaka/build && cd alpaka/build
cmake -DBOOST_ROOT=/opt/boost/1.73.0/ -DBOOST_LIBRARYDIR="/opt/boost/1.73.0/lib" -DBoost_USE_STATIC_LIBS=ON -DBoost_USE_MULTITHREADED=ON -DBoost_USE_STATIC_RUNTIME=OFF -DCMAKE_BUILD_TYPE=Release -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLE=ON -DALPAKA_ACC_CPU_B_SEQ_T_THREADS_ENABLE=OFF -DALPAKA_ACC_CPU_B_SEQ_T_FIBERS_ENABLE=OFF -DALPAKA_ACC_CPU_B_TBB_T_SEQ_ENABLE=OFF -DALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLE=OFF -DALPAKA_ACC_CPU_B_SEQ_T_OMP2_ENABLE=OFF -DALPAKA_ACC_CPU_BT_OMP4_ENABLE=OFF -DALPAKA_ACC_GPU_CUDA_ENABLE=ON -DALPAKA_ACC_GPU_HIP_ENABLE=OFF -DALPAKA_DEBUG=0 -DALPAKA_CUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++-10 -Dalpaka_BUILD_EXAMPLES=ON -DBUILD_TESTING=ON ..
make -j14
test/unit/math/math