CLBlast: Adreno 330 GPU - GEMM tuner gets killed with custom M, N, K sizes

GEMM tuner gets killed for few custom M, N, K settings

./clblast_tuner_xgemm -m 256 -n 784 -k 1152 -precision 32 -alpha 1.00 -beta 1.00

[       OK ] Completed Xgemm (454.1 ms) - 18 out of 360
[ RUN      ] Running Xgemm
[       OK ] Completed Xgemm (274.6 ms) - 19 out of 360
[ RUN      ] Running Xgemm
W/Adreno-GSL (26790): <gsl_ldd_control:476>: ioctl fd 3 code 0x400c0907 (IOCTL_KGSL_DEVICE_WAITTIMESTAMP_CTXTID) failed: errno 35 Resource deadlock avoided
W/Adreno-GSL (26790): <log_gpu_snapshot:385>: panel.gpuSnapshotPath is not set.not generating user snapshot
W/Adreno-GSL (26790): <gsl_umd_context_waittimestamp:271>: error:-12 ctx 00000001 ts 62
W/Adreno-GSL (26790): <gsl_ldd_control:476>: ioctl fd 3 code 0xc02c093d (IOCTL_KGSL_SUBMIT_COMMANDS) failed: errno 35 Resource deadlock avoided
W/Adreno-GSL (26790): <log_gpu_snapshot:385>: panel.gpuSnapshotPath is not set.not generating user snapshot
E/Adreno-GSL (26790): <os_exit:1759>: Exiting the process clblast_tuner_xgemm from function cl_oxili_cmdbuffer_issue and line 366
Killed

TunerError.txt

Attached the full error as a file. Currently I’ve tuned GEMM for few of my custom M,N,K parameters and I have their respective best MWG, NWG, KWG… etc parameters. Until the next release with the following feature

- Kernels are now cached based on their tuning parameters: fits the use-case of 'OverrideParameters'

is there any quick hack to use these MWG, NWG, KWG… based on input M,N,K sizes? Please let me know if you have any suggestions.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 20 (20 by maintainers)

Most upvoted comments

I double-checked the status of this issue and it seems there is quite a lot of things ongoing, but I propose to close this issue now, because:

  1. The original issue discussed here is a bug in the Qualcomm drivers and should be reported there. The tuners cannot catch such errors, it can only check for a valid status code returned and catch C++ exceptions, but not just any crash.

  2. There is now a new tuner in the master branch which always outputs the parameter configuration before compiling, so it is now easier to diagnose which parameter combination(s) caused the compiler to crash.

  3. Issue #195 is now closed. Buffer creation time can still be costly, but there is now a new tuner to decide which kernel to run (with or without temporary buffer creation).

  4. We are aware of bad performance on Qualcomm GPUs, but there is a plan to implement a pre-processor first, and you also reported this in other issues which are still open. I propose to continue discussing there if needed.

If there is anything specific, please open a new issue. This issue contains a bit of everything 😉