OpenBLAS: Segfault when NUM_THREADS is smaller than host vCPUs
Most context can be found here: https://github.com/xianyi/OpenBLAS/pull/2982
But in summary:
A segfault will occur if the NUM_THREADS used to build openblas is lower than the host’s vCPU count when using OMP.
[Switching to Thread 0x7ffebba6e640 (LWP 104473)]
0x00007fffe95cbe70 in dgemm_itcopy_ZEN () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
(gdb) backtrace
#0 0x00007fffe95cbe70 in dgemm_itcopy_ZEN () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
#1 0x00007fffe882416a in inner_thread () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
#2 0x00007fffe8956bf6 in exec_blas._omp_fn () from /nix/store/3v2i5ga27y9al3z0d8ccwaa389qz3ma6-lapack-3/lib/liblapack.so.3
#3 0x00007fffe347f916 in ?? () from /nix/store/f23sq7lk6xfrvz467ffkpzackyk5q8dm-gfortran-9.3.0-lib/lib/libgomp.so.1
#4 0x00007ffff7c20e9e in start_thread () from /nix/store/5didcr1sjp2rlx8abbzv92rgahsarqd9-glibc-2.32/lib/libpthread.so.0
#5 0x00007ffff793866f in clone () from /nix/store/5didcr1sjp2rlx8abbzv92rgahsarqd9-glibc-2.32/lib/libc.so.6
This currently has a “workaround” by setting the environment variable OMP_NUM_THREADS to a value equal or lower to the NUM_THREADS used during build. However, desired behavior is that it doesn’t crash.
EDIT: Also of note, this fails when running the numpy test suite. Openblas is able to run it’s test suite without crashing. blas, openblas, and lapack are all aliased to openblas
$ cat numpy/site.cfg
[blas]
include_dirs=/nix/store/bm0mjpbnaxixma2wvmj99zq2kfq1017h-blas-3-dev/include
library_dirs=/nix/store/35lvfykv13mnva6sipnrqks05mvrldi0-blas-3/lib
runtime_library_dirs=/nix/store/35lvfykv13mnva6sipnrqks05mvrldi0-blas-3/lib
[lapack]
include_dirs=/nix/store/s9rb0jhplgsfcnay1mp7kn1zhxyppq0n-lapack-3-dev/include
library_dirs=/nix/store/w6xjrzfnxsi7jwrlvfy10qcljm6gyhss-lapack-3/lib
runtime_library_dirs=/nix/store/w6xjrzfnxsi7jwrlvfy10qcljm6gyhss-lapack-3/lib
[openblas]
include_dirs=/nix/store/bm0mjpbnaxixma2wvmj99zq2kfq1017h-blas-3-dev/include:/nix/store/s9rb0jhplgsfcnay1mp7kn1zhxyppq0n-lapack-3-dev/include
libraries=lapack,lapacke,blas,cblas
library_dirs=/nix/store/35lvfykv13mnva6sipnrqks05mvrldi0-blas-3/lib:/nix/store/w6xjrzfnxsi7jwrlvfy10qcljm6gyhss-lapack-3/lib
runtime_library_dirs=/nix/store/35lvfykv13mnva6sipnrqks05mvrldi0-blas-3/lib:/nix/store/w6xjrzfnxsi7jwrlvfy10qcljm6gyhss-lapack-3/lib
System / Build Info
$ nix eval -f default.nix openblas.makeFlags
[ "BINARY=64" "CC=cc" "CROSS=0" "DYNAMIC_ARCH=1" "FC=gfortran" "HOSTCC=cc" "INTERFACE64=1" "NO_AVX512=1" "NO_BINARY_MODE=" "NO_SHARED=0" "NO_STATIC=1" "NUM_THREADS=64" "TARGET=ATHLON" "USE_OPENMP=1" ]
CC = gcc-9.3.0 arch = x86_64 platform = linux CPU = 3990X ( 64 cores / 128 thread)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 24 (14 by maintainers)
Commits related to this issue
- Don't overwrite blas_thread_buffer if already set After a fork it is possible that blas_thread_buffer has already allocated memory buffers: goto_set_num_threads does allocate those already and it may... — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
- Don't overwrite blas_thread_buffer if already set After a fork it is possible that blas_thread_buffer has already allocated memory buffers: goto_set_num_threads does allocate those already and it may... — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
- Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
- Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
- Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
- Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
- Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
- Add reproducer test for crash after fork See #2993 for an analysis — committed to Flamefire/OpenBLAS by Flamefire 4 years ago
Ok found the problem. What happens is:
blas_thread_bufferarray and theblas_server_availflagcblas_ssyrk) callsnum_cpu_availblas_cpu_number=64(capped in previous settings) andopenmp_nthreads=256goto_set_num_threads(openmp_nthreads)blas_thread_bufferallocating memory where requiredexec_blasis called a bit later (bygemm) which findsblas_server_avail=0and callsblas_thread_initwhich reinits theblas_thread_bufferwithout checking it firstNUM_BUFFERS=64*2, 32 are allocated bygoto_set_num_threads, 1 bygemdirectly, then 32 further ones are requested byblas_thread_inittotalling 65 -> last entry is zeroThis also doesn’t come up for small thread numbers because the min. for NUM_BUFFERS is 50, so I guess below 25 cores there won’t be an observable problem. And finally: OpenBLAS does show an error/warning that the allocation fails but the numpy test suite swallows it
PR coming up.
Sure. As it turned out it was indeed my commit you PRed to revert which caused the issue although the reason was much more complicated than thought.
@brada4 not sure what you are seeing, and no disassembly needed I guess as the gemm_tcopy_4 is plain C - it is trying to copy data from the buffer normally pointed to by b that simply is not there as there was no slot available for it in the buffers array.