OpenBLAS: GIMP hang / deadlock in get_memory_table / blas_thread_shutdown

Hi,

There is a bug open in Debian related to gimp 2.10.2 and openblas 3.2: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=903514.

Depending on the machine and environment used, gimp can deadlock at startup because of a deadlock inside glibc. I’m forwarding what I wrote for the Debian bug tracker:

Using gdb to find where it hung (gimp-gdb.txt) gives threads waiting on a lock while doing thread-local related stuff and the main thread is in the process of dl_close-ing openblas waiting the threads to exit using pthread_join.

It seems that the lock used in tls_get_addr_tail [0] is the same as the one locked by _dl_close [1]. A recursive lock is used but here it does not help as the thread calling tls_get_addr_tail and _dl_close are not the same.

This deadlock may not happen everytime, in my case, the openblas threads are still initializing while _dl_close is called.

Given this, I think the offending commit in openblas is bf40f806 [2] which add TLS variables to avoid locking. But many change were done since then.

One of related bug report is [3] which seems to indicate that the locks handling is not easy inside glibc.

There were an attempt to fix deadlocks between tls_get_addr and a _dl_close of a module whose finalizer joins with that thread [4].

So I see these possibles solutions:

  • Add a breaks between gimp and openblas
  • Disable TLS in openblas build (if possible, but this would cause a performance loss for users that use openblas without gimp)
  • Patch glibc to not deadlock (but this seems not easy to do at all)

[0] https://github.com/bminor/glibc/blob/glibc-2.27/elf/dl-tls.c#L761 [1] https://github.com/bminor/glibc/blob/glibc-2.27/elf/dl-close.c#L812

[2] https://github.com/xianyi/OpenBLAS/commit/bf40f806efa55c7a7c7ec57535919598eaeb569d#diff-31f8d4e8863583d95bf2f9529f83844e [4] https://sourceware.org/ml/libc-alpha/2015-06/msg00062.html

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 31 (9 by maintainers)

Most upvoted comments

@amurzeau can you try out #1726? You’ll just need to define USE_COMPILER_TLS to 0 in your build; that PR allows you to override the flag.