botan: Sporadic test failure with 3.1.0 (related to BOTAN_CLEAR_CPUID -- 0cd6692)
Hi folks,
Thanks for Botan. I package it in Gentoo and hit the following test failure when first bumping our package to 3.1.0:
SP800-108-Feedback(HMAC(SHA-1)) ran 120 tests in 2.72 msec all ok
SP800-108-Feedback(HMAC(SHA-256)) ran 120 tests in 2.73 msec all ok
SP800-108-Feedback(HMAC(SHA-384)) ran 120 tests in 3.02 msec all ok
SP800-108-Feedback(HMAC(SHA-512)) ran 120 tests in 2.61 msec all ok
SP800-108-Pipeline(CMAC(AES-128)) ran 120 tests in 3.87 msec 1 FAILED
Failure 1: SP800-108-Pipeline(CMAC(AES-128)) unexpected result for derived key
Produced: 7020B91FCED6BBC5A9A2F196
Expected: A2762C4FF7BC4D21E5C25245
XOR Diff: D2569550396AF6E44C60A3D3 (at src/tests/test_kdf.cpp:54)
Note 1: SP800-108-Pipeline(CMAC(AES-128)) Test # 18 SP800-108-Pipeline(CMAC(AES-128)) failed Output=A2762C4FF7BC4D21E5C25245 Secret=F9089D56D9A6C6F6BCB9992D1896510C
SP800-108-Pipeline(CMAC(AES-192)) ran 120 tests in 4.28 msec all ok
SP800-108-Pipeline(CMAC(AES-256)) ran 120 tests in 3.91 msec all ok
SP800-108-Pipeline(CMAC(TripleDES)) ran 120 tests in 1.86 msec all ok
SP800-108-Pipeline(HMAC(SHA-1)) ran 120 tests in 3.97 msec all ok
SP800-108-Pipeline(HMAC(SHA-256)) ran 120 tests in 3.90 msec all ok
SP800-108-Pipeline(HMAC(SHA-384)) ran 120 tests in 3.89 msec all ok
SP800-108-Pipeline(HMAC(SHA-512)) ran 120 tests in 3.88 msec all ok
SP800-56A(HMAC(SHA-1)) ran 200 tests in 2.92 msec all ok
SP800-56A(HMAC(SHA-224)) ran 196 tests in 3.55 msec all ok
SP800-56A(HMAC(SHA-256)) ran 196 tests in 3.55 msec all ok
SP800-56A(HMAC(SHA-384)) ran 196 tests in 3.54 msec all ok
SP800-56A(HMAC(SHA-512)) ran 192 tests in 3.96 msec all ok
[...]
zfec:
ZFEC encoding/decoding ran 10449 tests in 8.10 msec all ok
Tests complete ran 2927570 tests in 22.14 sec 1 tests failed (in kdf_kat)
* ERROR: dev-libs/botan-3.1.0::gentoo failed (test phase):
* Validation tests failed
I can’t reproduce this on subsequent runs yet…
This is with GCC 13.1.1 20230708 on amd64. The machine has ECC ram and I don’t see any EDAC events though…
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 24 (8 by maintainers)
Commits related to this issue
- Fix sporadic test failure (GH #3623) The ZFEC tests were not serialized, but should have been. This would cause rare sporadic test failures, since these tests tamper with the CPUID bits. Add a check... — committed to randombit/botan by randombit a year ago
- Fix sporadic test failure (GH #3623) The ZFEC tests were not serialized, but should have been. This would cause rare sporadic test failures, since these tests tamper with the CPUID bits. Add a check... — committed to randombit/botan by randombit a year ago
- dev-libs/botan: handle -fsanitize={address,undefined} See https://github.com/randombit/botan/issues/3623#issuecomment-1632453228. We need to handle -fsanitize=address and -fsanitize=undefined and pas... — committed to gentoo/gentoo by thesamesam a year ago
Nice! Trying the diff from the PR now.
(We have a 96-cored arm64 box (and a 266-cored sparc one, but I assume your interest in that is less, although it’s great for alignment checking) if either of you two ever want access to it. It’s purely for development and testing stuff like this, so it’s no bother.)
Edit: I’m going to declare victory for now after 35 good iterations. Thank you!
#3625
I am unable to repro this locally, probably because my desktop doesn’t have enough cores 😭 so I’ll need @reneme or @thesamesam to confirm this fixes it
@thesamesam Fixes are included in the newly released 3.1.1. Thanks so much for reporting this and all your help in repro and testing.
Thanks, lemme adjust the ebuild to handle that.
@thesamesam That’s an test that is intentionally UB because that’s the only way we can check an error case. I imagine this happened because you passed the UbSan flags directly via CXXFLAGS - we expect UbSan to be set with
--enable-sanitizers=undefinedin which case we set a macro and skip tests of this sort.@reneme I think it is not because of
BOTAN_CLEAR_CPUIDat all but because of the checks likeso when ZFEC cleared SSSE3 in order to run its test, it implicitly also “cleared” CLMUL. So it increased the blast radius of what was affected. Before that point, when ZFEC cleared SSSE3 it would not have affected CLMUL or AESNI, etc.
The intent of that change was that using say
BOTAN_CLEAR_CPUID=ssse3actually disables use of all code using SSSE3; it would be awkward if you used that flag, and then we immediately use SSSE3 in CLMUL code or etc. But this may be too confusing in other parts of the code; I think we should revert that part of the change, at least for now.It was looking good at first, but after 4 iterations:
You’ve been looping the entire test corpus, right?
Just looping the failing
kyber_keygentest didn’t give me any errors in about 4000 iterations, unfortunately.Edit: same with
kdf_katin 20k iterations. 😦Edit 2: the full corpus failed rather quickly. 😮
I seem to be able to reproduce it.
Edit 3: 50 iterations of the test corpus on Botan 3.0.0 came back clean.