CLBlast: All sgemm tests fail on Intel HD530 gpu
All [edit: sgemm] tests fail on Intel HD530 gpu.
Output of running ./clblast_test_xgemm -device 1 > clblast_test_xgemm_out.txt 2>&1: https://gist.github.com/hughperkins/711741a770949889fbe8122a0c0fd288
git log:
$ git status
On branch cedric-master
Your branch is up-to-date with 'cedric/master'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
./
nothing added to commit but untracked files present (use "git add" to track)
carrot:mac hugh2$ git log -n 3
commit e52f9a9ff23f73325e5a8bc58765554d5f5eedc4
Merge: 115af8c 2cf7d84
Author: Cedric Nugteren <web@cedricnugteren.nl>
Date: Sun Nov 27 15:59:21 2016 +0100
Merge pull request #127 from CNugteren/development
Update to version 0.10.0
commit 2cf7d8429a58ac09842815d9859c90acbc38e8b7
Author: Cedric Nugteren <web@cedricnugteren.nl>
Date: Sun Nov 27 13:34:18 2016 +0100
Updated to version 0.10.0
commit 39c49bf4f977427de42fdfe27e8a2ed41ae4923e
Author: Cedric Nugteren <web@cedricnugteren.nl>
Date: Sun Nov 27 11:00:29 2016 +0100
Made it possible to use the command-line environmental variables for each executable and without re-running CMake
Environment:
- Mac Sierra
- HD530 is -device 1. I havent figured out how to install
clinfoyet, but EasyCL’sgpuinfogives:
$ ./gpuinfo
num platforms: 1
platform index: 0:
platform id: 0x7fff0000
platform vendor: Apple
platform name: Apple
platform num devices: 3
device index: 0
device id: 0xffffffff
device type: 2
global memory size: 16384MB
local memory size: 32KB
global cache size: 0KB
global cacheline size: 6291456
max memory alloc size: 4096MB
max compute units: 8
max workgroup size: 1024
max workitem dimensions: 3
max workitem sizes: 1024 1 1
device name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
opencl c version: OpenCL C 1.2
opencl device version: OpenCL 1.2
frequency MHz: 2600
device index: 1
device id: 0x1024500
device type: 4
global memory size: 1536MB
local memory size: 64KB
global cache size: 0KB
global cacheline size: 0
max memory alloc size: 384MB
max compute units: 24
max workgroup size: 256
max workitem dimensions: 3
max workitem sizes: 256 256 256
device name: Intel(R) HD Graphics 530
opencl c version: OpenCL C 1.2
opencl device version: OpenCL 1.2
frequency MHz: 1050
device index: 2
device id: 0x1021c00
device type: 4
global memory size: 2048MB
local memory size: 32KB
global cache size: 0KB
global cacheline size: 0
max memory alloc size: 512MB
max compute units: 10
max workgroup size: 256
max workitem dimensions: 3
max workitem sizes: 256 256 256
device name: AMD Radeon Pro 450 Compute Engine
opencl c version: OpenCL C 1.2
opencl device version: OpenCL 1.2
frequency MHz: 800
Edit: note, for the record, all the easycl unit tests pass ok:
- gist of the output: https://gist.github.com/hughperkins/741e95ec3975a05d9efc46c289f33781
- example of a test that uses workgroup size > 1, and local memory:
- cpp hostside code: https://github.com/hughperkins/EasyCL/blob/master/test/testlocal.cpp
- corresponding opencl code: https://github.com/hughperkins/EasyCL/blob/master/test/testlocal.cl
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 47 (16 by maintainers)
Yes, in theory. In practice, system updates a posteriorae seem rare. In any case, there have been no system updates since opening this issue.
I havent done anything to configure clBLAS, it shouldnt be installed as far as I know; and the origin of my running the tests is that gemm fails on this gpu, when run from cuda-on-cl https://github.com/hughperkins/cuda-on-cl/issues/17
On Wed, May 3, 2017 at 10:22 AM, Cedric Nugteren notifications@github.com wrote: