CLBlast: DGEMM failures on Turing GPUs

I am getting DGEMM failures on Turing GPUs. I have tested on the NVIDIA T4 and NVIDIA T2000. Here are the logs from my system:

* Running on OpenCL device 'Quadro T2000'.
* Starting tests for the 'DGEMM' routine. Legend:
   : -> Test produced correct results
   . -> Test returned the correct error code
   X -> Test produced incorrect results
   / -> Test returned an incorrect error code
   \ -> Test not executed: OpenCL-kernel compilation error
   o -> Test not executed: Unsupported precision
   - -> Test not completed: Reference CBLAS doesn't output error codes
* Testing with error margins of 0.5% (relative) and 0.001 (absolute)
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 111 (regular)':
   XXXXXXXX----XXXX---X---X-------XXXXXXXXX----XXXX---X---X-------X
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 23.86%: m=7 n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 23.86%: m=7 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 23.86%: m=7 n=7 k=64 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 23.86%: m=7 n=7 k=64 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.39%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 61.22%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.22%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.35%: m=64 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.52%: m=64 n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.52%: m=64 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 74.26%: m=64 n=7 k=64 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 72.87%: m=64 n=7 k=64 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 72.00%: m=64 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 73.22%: m=64 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 78.74%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 79.87%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 72.70%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate   0.0%: 0 passed / 34 skipped / 30 failed
* Testing 'regular behaviour' for '101 (row-major) 111 (regular) 112 (transposed)':
   XXXXXXXX------XX-X-X-X-X-------XXXXXXXXX------XX-X-X-X-X-------X
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 20.45%: m=7 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 66.96%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 50.09%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 53.04%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 44.52%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.22%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.52%: m=64 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 73.22%: m=64 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 73.22%: m=64 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 75.68%: m=64 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 77.22%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 77.24%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 75.73%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 82.19%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate   0.0%: 0 passed / 34 skipped / 30 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 111 (regular)':
   XXXXXXXXXXXXXXXX---X---X---X---X----XXXX----XXXX-------X-------X
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 20.45%: m=7 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 20.45%: m=7 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 50.26%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 47.30%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 53.04%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.57%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.52%: m=64 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 73.04%: m=64 n=7 k=64 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 74.43%: m=64 n=7 k=64 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 73.22%: m=64 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 73.22%: m=64 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 74.95%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 83.31%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate   0.0%: 0 passed / 34 skipped / 30 failed
* Testing 'regular behaviour' for '101 (row-major) 112 (transposed) 112 (transposed)':
   XXXXXXXX--XX--XX-X-X-X-X---X---X----XXXX------XX-----X-X-------X
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 20.45%: m=7 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 23.86%: m=7 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 66.61%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.57%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 55.83%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 52.87%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 39.13%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 44.70%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.35%: m=64 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 73.22%: m=64 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 75.48%: m=64 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 74.50%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 77.15%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 82.50%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate   0.0%: 0 passed / 37 skipped / 27 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 111 (regular)':
   XXXXXXXX--XX--XXXXXXXXXX--XX--XX-----X-X-------X-----X-X-------X
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 20.45%: m=7 n=7 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 53.04%: m=7 n=64 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.57%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.57%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.57%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 50.26%: m=7 n=64 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 61.39%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 53.04%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 49.91%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 44.35%: m=7 n=64 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.39%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 44.52%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 39.13%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.52%: m=64 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 73.22%: m=64 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 75.63%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 79.14%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 79.56%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate   0.0%: 0 passed / 34 skipped / 30 failed
* Testing 'regular behaviour' for '102 (col-major) 111 (regular) 112 (transposed)':
   XXXXXXXXXXXXXXXX--XX--XX--XX--XX-----X-X-----X-X-------X-------X
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 20.45%: m=7 n=7 k=64 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 20.45%: m=7 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 23.86%: m=7 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 61.04%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.39%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 47.30%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 44.70%: m=7 n=64 k=64 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 50.26%: m=7 n=64 k=64 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 36.35%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.39%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 73.22%: m=64 n=7 k=64 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 73.04%: m=64 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 71.56%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 79.54%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate   0.0%: 0 passed / 34 skipped / 30 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 111 (regular)':
   XXXXXXXX------XXXXXXXXXX------XX-X-X-X-X-------X-X-X-X-X-------X
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 66.96%: m=7 n=64 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.22%: m=7 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.57%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 44.70%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 55.83%: m=7 n=64 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 64.17%: m=7 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.57%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 53.04%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 33.57%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 50.09%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 74.43%: m=64 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 75.66%: m=64 n=64 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 85.51%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 78.76%: m=64 n=64 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 79.92%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 85.58%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate   0.0%: 0 passed / 34 skipped / 30 failed
* Testing 'regular behaviour' for '102 (col-major) 112 (transposed) 112 (transposed)':
   XXXXXXXX----XXXX--XX--XX------XX-X-X-X-X-----X-X---X---X-------X
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 23.86%: m=7 n=7 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 23.86%: m=7 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=7 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 20.45%: m=7 n=7 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 24.43%: m=7 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 66.09%: m=7 n=64 k=7 lda=7 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 50.09%: m=7 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 41.91%: m=7 n=64 k=7 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 58.61%: m=7 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 53.04%: m=7 n=64 k=64 lda=64 ldb=64 ldc=7 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 58.26%: m=7 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.87%: m=64 n=7 k=7 lda=7 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.52%: m=64 n=7 k=7 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 76.70%: m=64 n=7 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 72.00%: m=64 n=7 k=64 lda=64 ldb=7 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 72.00%: m=64 n=7 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 74.59%: m=64 n=64 k=7 lda=7 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 79.16%: m=64 n=64 k=7 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Error rate 79.56%: m=64 n=64 k=64 lda=64 ldb=64 ldc=64 offa=0 offb=0 offc=0 alpha=3.14 beta=3.14 
   Pass rate   0.0%: 0 passed / 37 skipped / 27 failed
* Completed all test-cases for this routine. Results:
   0 test(s) passed
   278 test(s) skipped
   234 test(s) failed
 --- OpenCL device naming:
* Device type                   GPU
* Device name                   Quadro T2000
* Platform vendor               NVIDIA Corporation
* Platform version              OpenCL 3.0 CUDA 11.4.94

 --- CLBlast device naming:
* Device type                   GPU
* Device name                   Quadro T2000
* Device vendor                 NVIDIA
* Device architecture           SM7.5

 --- OpenCL device properties:
* Max work group size           1024
* Max work item dimensions      3
* - Max work item size #0       1024
* - Max work item size #1       1024
* - Max work item size #2       64
* Local memory size             49152KB
* Extensions:
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_nv_kernel_attribute cl_khr_device_uuid cl_khr_pci_bus_info

 --- Some OpenCL library benchmarks (functions from clpp11.h):
* queue.GetContext()            0.0001 ms
* queue.GetDevice()             0.0000 ms
* device.Name()                 0.0746 ms
* device.Vendor()               0.0001 ms
* device.Version()              0.0005 ms
* device.Platform()             0.0000 ms
* Buffer<float>(context, 1024)  0.0005 ms
umar@gus /d/d/C/build (master)> uname -a
Linux gus 5.13.7-arch1-1 #1 SMP PREEMPT Sat, 31 Jul 2021 13:18:52 +0000 x86_64 GNU/Linux

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 19 (19 by maintainers)

Most upvoted comments

Thanks a lot and good that you found the issue. The assumption is that tuning parameters for a specific SM version work for other GPUs of that same family, but apparently not. I’ll add the new tuning data to the database this evening and will update CLBlast.

Yeah, Looks like that fixed it. The A100 is failing but I think we are also getting intermittent failures in other PRs as well. I can investigate it probably end of next week. I will close this issue. Thank you for your help!

Hmm. I ran all tests before I posted the tuning parameters. I am not sure why they are failing now. I can look into this later today.

Okay. I reran all the tuners and made sure that the CPU and GPU clocks were stable while they were running. I was able to get some valid parameters to run the tests with the correct results. I will upload them to the #1 issue.