OpenBLAS: GIMP hang / deadlock in get_memory_table / blas_thread_shutdown
Hi,
There is a bug open in Debian related to gimp 2.10.2 and openblas 3.2: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=903514.
Depending on the machine and environment used, gimp can deadlock at startup because of a deadlock inside glibc. I’m forwarding what I wrote for the Debian bug tracker:
Using gdb to find where it hung (gimp-gdb.txt) gives threads waiting on a lock while doing thread-local related stuff and the main thread is in the process of dl_close-ing openblas waiting the threads to exit using pthread_join.
It seems that the lock used in tls_get_addr_tail [0] is the same as
the one locked by _dl_close [1].
A recursive lock is used but here it does not help as the thread calling
tls_get_addr_tail and _dl_close are not the same.
This deadlock may not happen everytime, in my case, the openblas threads
are still initializing while _dl_close is called.
Given this, I think the offending commit in openblas is bf40f806 [2] which add TLS variables to avoid locking. But many change were done since then.
One of related bug report is [3] which seems to indicate that the locks handling is not easy inside glibc.
There were an attempt to fix deadlocks between tls_get_addr and a
_dl_close of a module whose finalizer joins with that thread [4].
So I see these possibles solutions:
- Add a breaks between gimp and openblas
- Disable TLS in openblas build (if possible, but this would cause a performance loss for users that use openblas without gimp)
- Patch glibc to not deadlock (but this seems not easy to do at all)
[0] https://github.com/bminor/glibc/blob/glibc-2.27/elf/dl-tls.c#L761 [1] https://github.com/bminor/glibc/blob/glibc-2.27/elf/dl-close.c#L812
[2] https://github.com/xianyi/OpenBLAS/commit/bf40f806efa55c7a7c7ec57535919598eaeb569d#diff-31f8d4e8863583d95bf2f9529f83844e [4] https://sourceware.org/ml/libc-alpha/2015-06/msg00062.html
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 31 (9 by maintainers)
@amurzeau can you try out #1726? You’ll just need to define USE_COMPILER_TLS to 0 in your build; that PR allows you to override the flag.