OpenBLAS: Segfault when NUM_THREADS is smaller than host vCPUs

Most context can be found here: https://github.com/xianyi/OpenBLAS/pull/2982

But in summary:

A segfault will occur if the NUM_THREADS used to build openblas is lower than the host’s vCPU count when using OMP.

[Switching to Thread 0x7ffebba6e640 (LWP 104473)]
0x00007fffe95cbe70 in dgemm_itcopy_ZEN () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
(gdb) backtrace
#0  0x00007fffe95cbe70 in dgemm_itcopy_ZEN () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
#1  0x00007fffe882416a in inner_thread () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
#2  0x00007fffe8956bf6 in exec_blas._omp_fn () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
#3  0x00007fffe347f916 in ?? () from /nix/store/f23sq7lk6xfrvz467ffkpzackyk5q8dm-gfortran-9.3.0-lib/lib/libgomp.so.1
#4  0x00007ffff7c20e9e in start_thread () from /nix/store/5didcr1sjp2rlx8abbzv92rgahsarqd9-glibc-2.32/lib/libpthread.so.0
#5  0x00007ffff793866f in clone () from /nix/store/5didcr1sjp2rlx8abbzv92rgahsarqd9-glibc-2.32/lib/libc.so.6

This currently has a “workaround” by setting the environment variable OMP_NUM_THREADS to a value equal or lower to the NUM_THREADS used during build. However, desired behavior is that it doesn’t crash.

EDIT: Also of note, this fails when running the numpy test suite. Openblas is able to run it’s test suite without crashing. blas, openblas, and lapack are all aliased to openblas

$ cat numpy/site.cfg
[blas]
include_dirs=/nix/store/bm0mjpbnaxixma2wvmj99zq2kfq1017h-blas-3-dev/include
library_dirs=/nix/store/35lvfykv13mnva6sipnrqks05mvrldi0-blas-3/lib
runtime_library_dirs=/nix/store/35lvfykv13mnva6sipnrqks05mvrldi0-blas-3/lib

[lapack]
include_dirs=/nix/store/s9rb0jhplgsfcnay1mp7kn1zhxyppq0n-lapack-3-dev/include
library_dirs=/nix/store/w6xjrzfnxsi7jwrlvfy10qcljm6gyhss-lapack-3/lib
runtime_library_dirs=/nix/store/w6xjrzfnxsi7jwrlvfy10qcljm6gyhss-lapack-3/lib

[openblas]
include_dirs=/nix/store/bm0mjpbnaxixma2wvmj99zq2kfq1017h-blas-3-dev/include:/nix/store/s9rb0jhplgsfcnay1mp7kn1zhxyppq0n-lapack-3-dev/include
libraries=lapack,lapacke,blas,cblas
library_dirs=/nix/store/35lvfykv13mnva6sipnrqks05mvrldi0-blas-3/lib:/nix/store/w6xjrzfnxsi7jwrlvfy10qcljm6gyhss-lapack-3/lib
runtime_library_dirs=/nix/store/35lvfykv13mnva6sipnrqks05mvrldi0-blas-3/lib:/nix/store/w6xjrzfnxsi7jwrlvfy10qcljm6gyhss-lapack-3/lib

System / Build Info

$ nix eval -f default.nix openblas.makeFlags
[ "BINARY=64" "CC=cc" "CROSS=0" "DYNAMIC_ARCH=1" "FC=gfortran" "HOSTCC=cc" "INTERFACE64=1" "NO_AVX512=1" "NO_BINARY_MODE=" "NO_SHARED=0" "NO_STATIC=1" "NUM_THREADS=64" "TARGET=ATHLON" "USE_OPENMP=1" ]

CC = gcc-9.3.0 arch = x86_64 platform = linux CPU = 3990X ( 64 cores / 128 thread)

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 24 (14 by maintainers)

Commits related to this issue

Don't overwrite blas_thread_buffer if already set After a fork it is possible that blas_thread_buffer has already allocated memory buffers: goto_set_num_threads does allocate those already and it may... — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
Don't overwrite blas_thread_buffer if already set After a fork it is possible that blas_thread_buffer has already allocated memory buffers: goto_set_num_threads does allocate those already and it may... — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago

Most upvoted comments

Ok found the problem. What happens is:

the fork handler resets the blas_thread_buffer array and the blas_server_avail flag
A BLAS function (in this case cblas_ssyrk) calls num_cpu_avail
This detects a mismatch: blas_cpu_number=64 (capped in previous settings) and openmp_nthreads=256
This then calls goto_set_num_threads(openmp_nthreads)
That function adjusts the blas_thread_buffer allocating memory where required
exec_blas is called a bit later (by gemm) which finds blas_server_avail=0 and calls blas_thread_init which reinits the blas_thread_buffer without checking it first
This then runs out of buffers to allocate: NUM_BUFFERS=64*2, 32 are allocated by goto_set_num_threads, 1 by gem directly, then 32 further ones are requested by blas_thread_init totalling 65 -> last entry is zero

This also doesn’t come up for small thread numbers because the min. for NUM_BUFFERS is 50, so I guess below 25 cores there won’t be an observable problem. And finally: OpenBLAS does show an error/warning that the allocation fails but the numpy test suite swallows it

PR coming up.

Flamefire on Nov 20, 2020

Sure. As it turned out it was indeed my commit you PRed to revert which caused the issue although the reason was much more complicated than thought.

Flamefire on Nov 20, 2020

@brada4 not sure what you are seeing, and no disassembly needed I guess as the gemm_tcopy_4 is plain C - it is trying to copy data from the buffer normally pointed to by b that simply is not there as there was no slot available for it in the buffers array.

martin-frbg on Nov 19, 2020