OpenBLAS: DPOTRF deadlocks on arm cortex A15
Using the benchmarks I noticed that the potrf kernel would deadlock on my A15 arm board (odroid):
$ OPBLAS_NUM_THREADS=2 ./dpotrf.goto
From : 1 To : 200 Step = 1 Uplo = U
1 : 0.01 MFlops : 0.000 Sec : Test=F
2 : 0.07 MFlops : 0.000 Sec : Test=F
3 : 0.25 MFlops : 0.000 Sec : Test=F
4 : 0.59 MFlops : 0.000 Sec : Test=F
5 : 0.90 MFlops : 0.000 Sec : Test=F
6 : 1.57 MFlops : 0.000 Sec : Test=F
7 : 2.59 MFlops : 0.000 Sec : Test=F
8 : 3.92 MFlops : 0.000 Sec : Test=F
and then it stalls. On contrary running it with OPENBLAS_NUM_THREADS=1 is fine.
I used the latest version of the code forked from here, gcc 5.2.1, the board runs Ubuntu 15.10 and here is some more information:
$ uname -a
Linux odroid 3.10.96+ #1 SMP PREEMPT Wed Mar 30 11:47:52 UTC 2016 armv7l armv7l armv7l GNU/Linux
Please tell me if I can add any other information that would make this reproducible. I don’t know which difference in the setup causes the deadlock but on my laptop (Intel, 64bits, Debian), the multithreaded dpotrf runs just fine.
I observed the same with the dcholesky benchmark, but I guess they just use the same routine.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 16 (6 by maintainers)
I do not think any of us wants to trick you into debugging the clang build. 😃 You could also try brada4’s other suggestion and retry the gcc build with the COMMON_OPT in Makefile.rule uncommented and set to “-O0” in the hope that avoiding compiler optimizations leads to “better” traces