CLBlast: TRSM crashes after dozen seconds

Hi Cedric,

I am not able to run any, even the simplest example TRSM with CLBlas on AMD R9 290X (haven’t tried on other hardware).

I am not able to post a C/C++ code, since I am using it through JOCLBlast (Java) and Neanderthal (Clojure), but it seems to me that either I am making a dumb mistake in calling it, which is fairly automatized, since I integrated most of the rest of JOCLBlast in the same way, and it works well, or perhaps TRSM does not work at all.

Here’s how to reproduce it: Create any simple triangular system. I used

A = [1 2 3 | 0 -1 -2 | 0 0 1] (triangular lower column-major)
B = [-1 -1 0.5]
Result = [-1 -1 1.5]

I am unable to get the result with JOCLBlast/CLBlast. There is activity for some time, but it never finishes, and eventually crashes the JVM.

Now, why do I think that this should work?

  1. The rest of CLBlast works like a charm with my infrastructure.
  2. I tried the same example with my JCublas engine (Clojure -> Java -> cuBlas) and it works swiftly and gives me the the same result as the CPU engine based on MKL.

Did you test TRSM, and, if so, can you point me to a (C/C++) example of how to use it in CLBlast? Maybe there is some gotcha that I am not aware of…

Thank you one more time for CLBlas -> amazing free library!

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 19 (9 by maintainers)

Most upvoted comments

I can confirm that TRSM now works and does not hang. There is an issue, though, with (maybe) giving wrong results when there are exotic combinations of layouts and uplos, though. However, I have solved this by just using ColMajor layout, and calculating other parameters according to that, in the same way I do with cuBLAS (which does not support layout in this method).

Not only this is solved, but the whole setup of CLBlast is now much faster. I guess this has to do with the recent changes of architecture. Congratulations!

@CNugteren @gpu

TRSM actually works on the Intel CPU through AMD’s platform! It crashes when run on the AMD GPU. I’m using the closed-source catalyst drivers, since the latest amdgpu-pro and ROCm from AMD do not (yet) support OpenCL 2.0 in the host code.

I’ve tried it both through Java, and as part of CLBlast tests. The result is the same.

When called from JVM, CPU returns the correct result, while GPU works for several seconds, and then crashes the JVM and prints the core dump.

When called as clblast_test_xtrsm, CPU prints the report (in which tests are reported as skipped) and finishes normally. GPU, on the other hand, segfaults.

Now, something curious is happening. When I tried clblast tests, I first run a test on the CPU (with -device 2). Then, I run the GPU, and it segfaulted. I then run the other GPU (device 1), and it freezed my machine, so I had to reboot. After the reboot, all clblast tests segfault, but when run from the JVM, clblast works as normal (the GPU xtrsm still crashes though).

@CNugteren This reminds me of the problems reported in #48