OpenBLAS: OpenBLAS 0.3.24 fails on FreeBSD port

Description

OpenBLAS 0.3.24 fails to build scipy during configure and some of octave-forge package at runtime on some CPUs. The failure is caused when I used CLANG and enabled OPENMP, even if OpenBLAS’s regression test passed.

A FreeBSD committer says about this issue:

E.g. such an error has been reported for MacPorts: https://github.com/OpenMathLib/OpenBLAS/issues/4239. This is not exactly the same one, but there are some common points. In their case it seems caused by a combination of some versions of clang × some version of the linker.

Environment

uname -a
FreeBSD blizzard 13.2-RELEASE-p4 FreeBSD 13.2-RELEASE-p4 GENERIC amd64

clang --version
FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)
Target: x86_64-unknown-freebsd13.2
Thread model: posix
InstalledDir: /usr/bin

gfortran12 --version
GNU Fortran (FreeBSD Ports Collection) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Other information

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 48 (14 by maintainers)

Most upvoted comments

The question is if all of OpenMP initialization fails or “only” the call to omp_get_max_threads (I need to emulate this by reading any OMP_NUM_THREADS directly instead), and if this is specific to the libomp from the (older) LLVM version that FreeBSD carries. So far I still see nothing wrong with calling omp_get_max_threads where we do.

You need to sneak with debugger to assure hang is in OpenBLAS code.

Sorry I forgot to report. I do not found libgomp.so in the gdb output. make science/scipy seems to be stuck at ?? () in libomp.so.

nobody     93330  100.0  0.2    62760  51208  1  RJ   22:42      8:54.98 /usr/local/bin/python3.9 scipy/special/utils/makenpz.py --use-timestamp scipy/special/tests/data/boost
gdb output<div>
(gdb) attach 93330

Attaching to program: /poudriere/data/.m/132amd64-local-workstation/01/usr/local/bin/python3.9, process 93330
warning: .dynamic section for "/usr/local/lib/libopenblas.so.0" is not at the expected address (wrong library or version mismatch?)
warning: Could not load shared library symbols for [vdso].
Do you need "set solib-search-path" or "set sysroot"?
Reading symbols from /usr/local/lib/libpython3.9.so.1.0...
(No debugging symbols found in /usr/local/lib/libpython3.9.so.1.0)
warning: File "/usr/local/lib/libpython3.9.so.1.0-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
Reading symbols from /lib/libcrypt.so.5...
(No debugging symbols found in /lib/libcrypt.so.5)
Reading symbols from /usr/local/lib/libintl.so.8...
Reading symbols from /usr/lib/libdl.so.1...
(No debugging symbols found in /usr/lib/libdl.so.1)
Reading symbols from /lib/libutil.so.9...
(No debugging symbols found in /lib/libutil.so.9)
Reading symbols from /lib/libm.so.5...
(No debugging symbols found in /lib/libm.so.5)
Reading symbols from /lib/libthr.so.3...
(No debugging symbols found in /lib/libthr.so.3)
Reading symbols from /lib/libc.so.7...
(No debugging symbols found in /lib/libc.so.7)
Reading symbols from /usr/local/lib/python3.9/lib-dynload/_heapq.cpython-39.so...
(No debugging symbols found in /usr/local/lib/python3.9/lib-dynload/_heapq.cpython-39.so)
Reading symbols from /usr/local/lib/python3.9/lib-dynload/_json.cpython-39.so...
(No debugging symbols found in /usr/local/lib/python3.9/lib-dynload/_json.cpython-39.so)
Reading symbols from /usr/local/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39.so...
(No debugging symbols found in /usr/local/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39.so)
Reading symbols from /usr/local/lib/libopenblas.so.0...
Reading symbols from /usr/local/lib/gcc12/libgfortran.so.5...
(No debugging symbols found in /usr/local/lib/gcc12/libgfortran.so.5)
Reading symbols from /usr/lib/libc++.so.1...
(No debugging symbols found in /usr/lib/libc++.so.1)
Reading symbols from /lib/libcxxrt.so.1...
(No debugging symbols found in /lib/libcxxrt.so.1)
Reading symbols from /usr/lib/libomp.so...
(No debugging symbols found in /usr/lib/libomp.so)
Reading symbols from /usr/local/lib/gcc12/libquadmath.so.0...
(No debugging symbols found in /usr/local/lib/gcc12/libquadmath.so.0)
Reading symbols from /usr/local/lib/gcc12/libgcc_s.so.1...
Reading symbols from /libexec/ld-elf.so.1...
(No debugging symbols found in /libexec/ld-elf.so.1)
[Switching to LWP 101112 of process 93330]
0x0000000830ed2452 in ?? () from /usr/lib/libomp.so

(gdb) info threads

  Id   Target Id                   Frame 
* 1    LWP 101112 of process 93330 0x0000000830ed2452 in ?? () from /usr/lib/libomp.so
(gdb) t a a bt


Thread 1 (LWP 101112 of process 93330):
#0  0x0000000830ed2452 in ?? () from /usr/lib/libomp.so
#1  0x0000000830eeb7c8 in __kmp_acquire_ticket_lock () from /usr/lib/libomp.so
#2  0x0000000830ef1564 in ?? () from /usr/lib/libomp.so
#3  0x0000000830eb191e in kmpc_malloc () from /usr/lib/libomp.so
#4  0x0000000830f2c1b7 in ?? () from /usr/lib/libomp.so
#5  0x0000000830efb925 in ?? () from /usr/lib/libomp.so
#6  0x0000000830ef15c4 in ?? () from /usr/lib/libomp.so
#7  0x0000000830efbccc in ?? () from /usr/lib/libomp.so
#8  0x0000000830efbc94 in ?? () from /usr/lib/libomp.so
#9  0x0000000830edad08 in omp_get_max_threads () from /usr/lib/libomp.so
#10 0x00000008333774fb in ccopy_k_PRESCOTT () from /usr/local/lib/libopenblas.so.0
#11 0x0000000834a036f0 in gotoblas_KATMAI () from /usr/local/lib/libopenblas.so.0
#12 0x0000000000000010 in ?? ()
#13 0x0000000834a0097c in memory () from /usr/local/lib/libopenblas.so.0
#14 0x0000000834a00980 in memory () from /usr/local/lib/libopenblas.so.0
#15 0x000000082053f760 in ?? ()
#16 0x0000000833376ec7 in caxpyc_k_PRESCOTT () from /usr/local/lib/libopenblas.so.0
#17 0x0000000833376de0 in caxpyc_k_PRESCOTT () from /usr/local/lib/libopenblas.so.0
#18 0x0000000000000004 in ?? ()
#19 0x0000000300000006 in ?? ()
#20 0xc1e04d5df03d0a9d in ?? ()
#21 0x00000008349de0d8 in __CTOR_LIST__ () from /usr/local/lib/libopenblas.so.0
#22 0x000000082498d008 in ?? ()
#23 0x0000000831786148 in ?? ()
#24 0x0000000820540020 in ?? ()
#25 0x000000082053fbe0 in ?? ()
#26 0x000022ea9d4c90ad in ?? () from /libexec/ld-elf.so.1
#27 0x000000082053f858 in ?? ()
#28 0x00007ffffffff700 in ?? ()
#29 0x0000000000000001 in ?? ()
#30 0x000022ea9d4d0409 in ?? () from /libexec/ld-elf.so.1
#31 0x000000082053f878 in ?? ()
#32 0x00000008243b2c64 in ?? () from /lib/libc.so.7
#33 0x0000000000000197 in ?? ()
#34 0x00000008243a9c64 in ?? () from /lib/libc.so.7
#35 0x000000082053f7f8 in ?? ()
#36 0x000000082107b408 in ?? ()
#37 0x000000082053f878 in ?? ()
#38 0x00000008243a9c64 in ?? () from /lib/libc.so.7
#39 0x000000082053f860 in ?? ()
#40 0x000022ea9d4cb465 in ?? () from /libexec/ld-elf.so.1
#41 0x000000082053f740 in ?? ()
#42 0x0000000100000014 in ?? ()
#43 0x000000082107b408 in ?? ()
#44 0x00000008243948e8 in ?? () from /lib/libc.so.7
#45 0x0000000000000000 in ?? ()
(gdb) detach

Detaching from program: /poudriere/data/.m/132amd64-local-workstation/01/usr/local/bin/python3.9, process 93330
[Inferior 1 (process 93330) detached]
(gdb) quit

</div>

The thread of makenpz process seems to calls openblas library. Only privileged user can attach some process running in poudriere’s jail.

can you make sure 3 packages you talk about are built on the same machine?

YES. I make sure 3 packages I talk about are built on the same machine. Because poudriere builds ALL DEPENDENCIES FROM SOURCE and installs them to chroot based system (called jail) before building some port. For example before scipy build poudriere builds 118 dependencies (including gcc, gfortran, OpenBLAS and so forth) using base system compiler. Please refer the brief document about poudriere package build system. https://man.freebsd.org/cgi/man.cgi?poudriere Is there anything for supporting for your investigation for finding a way to narrow this couse? I’m really having trouble solving this problem…

In addition, I think that AVX instructions are irrelevant for this problem because it is disabled in batch build settings except in the case it is enabled explicitly.