OpenBLAS: Segfault with large NUM_THREADS

In Fedora, we set NUM_THREADS=128 for the openmp and threaded versions (see spec file for reference; cc @susilehtola). Recently, we switched to openblas-openmp as the system-wide default BLAS/LAPACK implementation. Then, we found out that a test in the octave-statistics package (canoncorr.m) is segfaulting (octave was previously using openblas-serial), and we have managed to narrow down the issue to this point so far. Here’s a reproducible example with the current master branch:

$ docker run --rm -it fedora:rawhide
$ dnf install -y octave-statistics make git perl-devel
$ CMD='octave -H -q --no-window-system --no-site-file --eval pkg("load","statistics");test("/usr/share/octave/packages/statistics-1.4.1/canoncorr.m");'
$ git clone https://github.com/xianyi/OpenBLAS && cd OpenBLAS
$ make USE_THREAD=1 USE_OPENMP=1 NUM_THREADS=128
$ LD_PRELOAD=$PWD/libopenblas.so.0 $CMD
Segmentation fault (core dumped)

but

$ make clean
$ make USE_THREAD=1 USE_OPENMP=1 NUM_THREADS=64
$ LD_PRELOAD=$PWD/libopenblas.so.0 $CMD
PASSES 7 out of 7 tests

Any idea what could be happening here?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 54 (3 by maintainers)

Most upvoted comments

bisecting to see if/when/why this ever used to work

Actually decreasing the threshold (e.g. to 60) seems to make the crash go away, so it could be that it was the job array itself that is/was trashing the stack. Curiouser and curiouser…

I found that there’s a BLAS3_MEM_ALLOC_THRESHOLD of 160 here. Then USE_ALLOC_HEAP is used if MAX_CPU_NUMBER > BLAS3_MEM_ALLOC_THRESHOLD here. ~This happens (because MAX_CPU_NUMBER = NUM_THREADS * 2) for NUM_THREADS = 128 and also your case, NUM_THREADS = 88, but not for NUM_THREADS = 64 (which doesn’t cause a segfault).~ Maybe this rings a bell?

It would be easier to understand if there were actually that many threads running (thinking low-probability race condition ) but as the number of threads is capped at the hardware capability the NUM_THREADS should only size the GEMM buffer here.

Reproduced with the docker setup, built OpenBLAS with DEBUG=1 but cannot get gdb to print a meaningful backtrace (?? for anything except libjvm.so)