CLBlast: All sgemm tests fail on Intel HD530 gpu

All [edit: sgemm] tests fail on Intel HD530 gpu.

Output of running ./clblast_test_xgemm -device 1 > clblast_test_xgemm_out.txt 2>&1: https://gist.github.com/hughperkins/711741a770949889fbe8122a0c0fd288

git log:

$ git status
On branch cedric-master
Your branch is up-to-date with 'cedric/master'.
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	./

nothing added to commit but untracked files present (use "git add" to track)
carrot:mac hugh2$ git log -n 3
commit e52f9a9ff23f73325e5a8bc58765554d5f5eedc4
Merge: 115af8c 2cf7d84
Author: Cedric Nugteren <web@cedricnugteren.nl>
Date:   Sun Nov 27 15:59:21 2016 +0100

    Merge pull request #127 from CNugteren/development
    
    Update to version 0.10.0

commit 2cf7d8429a58ac09842815d9859c90acbc38e8b7
Author: Cedric Nugteren <web@cedricnugteren.nl>
Date:   Sun Nov 27 13:34:18 2016 +0100

    Updated to version 0.10.0

commit 39c49bf4f977427de42fdfe27e8a2ed41ae4923e
Author: Cedric Nugteren <web@cedricnugteren.nl>
Date:   Sun Nov 27 11:00:29 2016 +0100

    Made it possible to use the command-line environmental variables for each executable and without re-running CMake

Environment:

Mac Sierra
HD530 is -device 1. I havent figured out how to install clinfo yet, but EasyCL’s gpuinfo gives:

$ ./gpuinfo 
num platforms: 1

platform index: 0:
platform id: 0x7fff0000
platform vendor: Apple
platform name: Apple
platform num devices: 3

   device index: 0
   device id: 0xffffffff
   device type: 2
   global memory size: 16384MB
   local memory size: 32KB
   global cache size: 0KB
   global cacheline size: 6291456
   max memory alloc size: 4096MB
   max compute units: 8
   max workgroup size: 1024
   max workitem dimensions: 3
   max workitem sizes: 1024 1 1
   device name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
   opencl c version: OpenCL C 1.2 
   opencl device version: OpenCL 1.2 
   frequency MHz: 2600

   device index: 1
   device id: 0x1024500
   device type: 4
   global memory size: 1536MB
   local memory size: 64KB
   global cache size: 0KB
   global cacheline size: 0
   max memory alloc size: 384MB
   max compute units: 24
   max workgroup size: 256
   max workitem dimensions: 3
   max workitem sizes: 256 256 256
   device name: Intel(R) HD Graphics 530
   opencl c version: OpenCL C 1.2 
   opencl device version: OpenCL 1.2 
   frequency MHz: 1050

   device index: 2
   device id: 0x1021c00
   device type: 4
   global memory size: 2048MB
   local memory size: 32KB
   global cache size: 0KB
   global cacheline size: 0
   max memory alloc size: 512MB
   max compute units: 10
   max workgroup size: 256
   max workitem dimensions: 3
   max workitem sizes: 256 256 256
   device name: AMD Radeon Pro 450 Compute Engine
   opencl c version: OpenCL C 1.2 
   opencl device version: OpenCL 1.2 
   frequency MHz: 800

Edit: note, for the record, all the easycl unit tests pass ok:

gist of the output: https://gist.github.com/hughperkins/741e95ec3975a05d9efc46c289f33781
example of a test that uses workgroup size > 1, and local memory:
- cpp hostside code: https://github.com/hughperkins/EasyCL/blob/master/test/testlocal.cpp
- corresponding opencl code: https://github.com/hughperkins/EasyCL/blob/master/test/testlocal.cl

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 47 (16 by maintainers)

Most upvoted comments

Maybe not visible to the user, but updating the OS will surely also fix issues in the GPU drivers, right? And I guess similarly you’ll also get updated OpenCL libraries transparently whenever you update?

Yes, in theory. In practice, system updates a posteriorae seem rare. In any case, there have been no system updates since opening this issue.

hughperkins on Jul 22, 2017

I havent done anything to configure clBLAS, it shouldnt be installed as far as I know; and the origin of my running the tests is that gemm fails on this gpu, when run from cuda-on-cl https://github.com/hughperkins/cuda-on-cl/issues/17

On Wed, May 3, 2017 at 10:22 AM, Cedric Nugteren notifications@github.com wrote:

You can of course just first run the tuner for e.g. single-precision GEMV ( ./clblast_tuner_xgemv) and then see if that fixes the problems. Or perhaps first run all the tests (make test) and see if also one of the simpler routines breaks?

Have you verified whether it is CLBlast and not clBLAS causing the issue (as I mentioned in the previous post)? Just to be 100% sure about the origin of the issue!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CNugteren/CLBlast/issues/149#issuecomment-298861917, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHiqLU888KuTd7xlaZ0meAxGb8EWo3_ks5r2EdqgaJpZM4NMjSk .

hughperkins on May 3, 2017