CLBlast: Float16 GEMM on Adreno 330

Followed the samples/haxpy.c example to create float16 matrices and tried CLBlastSgemm on Adreno 330. Getting Error number -1011 CL_INVALID_D3D9_RESOURCE_NV or CL_INVALID_DX9_RESOURCE_INTEL.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 35 (31 by maintainers)

Most upvoted comments

apologies for the long long delay, something else came up and I haven’t had a chance to look at this. I’ve done some sanity checks, and the problem I was having was a bug in my code causing alpha to be set wrong.

OK, maybe I should buy such a dev-kit as well if they are not too expensive. But that’ll mean a bit longer term. Which board do you have/recommend?

There is an interface in CLBlast to set the tuning parameters manually. You could do that just before launching your kernel, possibly in an if-statement: e.g. if m == 16 then set_parameters(A) else set_parameters(B). But right now this would require re-compilation every time. I’m working on the preparation_for_size_specific_parameters branch to make it possible to save multiple compiled kernels in the cache, so that would speed-up things significantly.

Still, I think there might be an issue related to the kernel not be optimal in some sense to the hardware. I’ll have to get access first in order to investigate further.

I have not tested that yet unfortunately. I’ll see if I can make time for it soon.

I checked the supported extensions in NVIDIA 1080 Ti(Pascal architecture), looks like it doesnt support cl_khr_fp16

=== 1 OpenCL platform(s) found: ===
  PROFILE = FULL_PROFILE
  VERSION = OpenCL 1.2 CUDA 8.0.0
  NAME = NVIDIA CUDA
  VENDOR = NVIDIA Corporation
  EXTENSIONS = cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer