oneMKL: cuBLAS tests failed after aligning with SYCL 2020 specification
Summary
After cuBLAS backend was aligned with SYCL 2020 specification, a large number of cuBLAS tests failed.
Environment
- OS: Linux
- Hardware: NVIDIA TITAN RTX
- Backend library version: CUDA 10.2
- Compiler version: sycl-nightly 20220210
Steps to reproduce
Clone the branch: https://github.com/dnhsieh-intel/oneMKL/tree/SYCL_2020_cuBLAS
$ mkdir build && cd build
$ cmake .. -DENABLE_CUBLAS_BACKEND=True -DENABLE_MKLCPU_BACKEND=False -DENABLE_MKLGPU_BACKEND=False -DREF_BLAS_ROOT=<reference_blas_install_prefix> -DTARGET_DOMAINS=blas
$ cmake --build .
$ ctest --output-on-failure
Observed behavior
Some cuBLAS tests using dynamic libraries reported segmentation faults in column major cases. Output of cuBLAS unit tests: cublas_unit_tests.txt (81% tests passed, 321 tests failed out of 1668)
Examples of failed tests:
Start 1: BLAS/RT/Nrm2TestSuite/Nrm2Tests.RealSinglePrecision/Column_Major_TITAN_RTX
1/1668 Test #1: BLAS/RT/Nrm2TestSuite/Nrm2Tests.RealSinglePrecision/Column_Major_TITAN_RTX ..................................***Exception: SegFault 1.42 sec
Start 155: BLAS/RT/CopyUsmTestSuite/CopyUsmTests.RealSinglePrecision/Column_Major_TITAN_RTX
155/1668 Test #155: BLAS/RT/CopyUsmTestSuite/CopyUsmTests.RealSinglePrecision/Column_Major_TITAN_RTX ............................***Exception: SegFault 0.58 sec
Run this program with --terse_output to change the way it prints its output.
Note: Google Test filter = CopyUsmTestSuite/CopyUsmTests.RealSinglePrecision/Column_Major_TITAN_RTX
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from CopyUsmTestSuite/CopyUsmTests
[ RUN ] CopyUsmTestSuite/CopyUsmTests.RealSinglePrecision/Column_Major_TITAN_RTX
relative error = 1.59858 absolute error = 0.543816 limit = 0.000161767
Difference in entry 0: DPC++ -0.203628 vs. Reference 0.340188
relative error = 2.13229 absolute error = 0.225206 limit = 0.000161767
Difference in entry 1: DPC++ -0.330823 vs. Reference -0.105617
relative error = 2.63611 absolute error = 0.746282 limit = 0.000161767
...
Expected behavior
cuBLAS tests passed.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (21 by maintainers)
Close this issue as it cannot be reproduced now. It appears to be related to machine environment and/or compiler packages.