OpenBLAS: DPOTRF deadlocks on arm cortex A15

Using the benchmarks I noticed that the potrf kernel would deadlock on my A15 arm board (odroid):

$ OPBLAS_NUM_THREADS=2 ./dpotrf.goto 
From :   1  To : 200 Step =   1 Uplo = U
       1 :       0.01 MFlops :      0.000 Sec : Test=F
       2 :       0.07 MFlops :      0.000 Sec : Test=F
       3 :       0.25 MFlops :      0.000 Sec : Test=F
       4 :       0.59 MFlops :      0.000 Sec : Test=F
       5 :       0.90 MFlops :      0.000 Sec : Test=F
       6 :       1.57 MFlops :      0.000 Sec : Test=F
       7 :       2.59 MFlops :      0.000 Sec : Test=F
       8 :       3.92 MFlops :      0.000 Sec : Test=F

and then it stalls. On contrary running it with OPENBLAS_NUM_THREADS=1 is fine.

I used the latest version of the code forked from here, gcc 5.2.1, the board runs Ubuntu 15.10 and here is some more information:

$ uname -a
Linux odroid 3.10.96+ #1 SMP PREEMPT Wed Mar 30 11:47:52 UTC 2016 armv7l armv7l armv7l GNU/Linux

Please tell me if I can add any other information that would make this reproducible. I don’t know which difference in the setup causes the deadlock but on my laptop (Intel, 64bits, Debian), the multithreaded dpotrf runs just fine.

I observed the same with the dcholesky benchmark, but I guess they just use the same routine.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 16 (6 by maintainers)

Most upvoted comments

I do not think any of us wants to trick you into debugging the clang build. 😃 You could also try brada4’s other suggestion and retry the gcc build with the COMMON_OPT in Makefile.rule uncommented and set to “-O0” in the hope that avoiding compiler optimizations leads to “better” traces