highway: x86: HwyMathTestGroup/HwyMathTest.TestAllAtanh/EMU128 failure

I see a tst failure on Debian/x86 (32bits) on the floating point comparison:

Truncated:

[...]
814/894 Test #814: HwyMathTestGroup/HwyMathTest.TestAllAtanh/EMU128  # GetParam() = 2305843009213693952 ...................................Subprocess aborted***Exception:   0.03 sec
Running main() from ./googletest/src/gtest_main.cc
Note: Google Test filter = HwyMathTestGroup/HwyMathTest.TestAllAtanh/EMU128
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from HwyMathTestGroup/HwyMathTest
[ RUN      ] HwyMathTestGroup/HwyMathTest.TestAllAtanh/EMU128
f32x4: Atanh max_ulp 2
f32x2: Atanh(0.000000) expected 0.000000 actual 0.000000 ulp 5.20158e+08 max ulp 4
f32x2: Atanh(0.000000) expected 0.000000 actual 0.000000 ulp 5.20424e+08 max ulp 4
f32x2: Atanh(0.000000) expected 0.000000 actual 0.000000 ulp 5.20691e+08 max ulp 4
f32x2: Atanh(0.000000) expected 0.000000 actual 0.000000 ulp 5.20957e+08 max ulp 4
f32x2: Atanh(0.000000) expected 0.000000 actual 0.000000 ulp 5.21223e+08 max ulp 4
[...]

ref:

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 19 (19 by maintainers)

Commits related to this issue

Most upvoted comments

I was able to repro this on GCC 12.2 with HWY_SCALAR. HWY_EMU128 only comes up if we set -DHWY_BROKEN_EMU128=0, right? GCC <12.3 did have several bugs which is why we introduced this flag.

You should be able to reproduce it using GCC 11/12/13 (issue arise on different vector size though). Either force EMU128 or simply use the default SCALAR.

Clearly a bug in GCC, but I failed to find time to provide a minimal test case. As explained above as soon as you extract the math logic the compiler is able optimize the code correctly. So the issue is somewhere in the template hierarchy and/or the for loop which confuses the optimizer step. I should be able to find some time end of this week (hopefully).