python-mercuryapi: Program gets stuck in reader.stop_reading()

I’m testing the program by repeating reader.start_reading() and reader.stop_reading(). At some point (few minutes or few hours), reader.stop_reading() will not return resulting in the program getting stuck. I was unable to replicate the problem with a pure C implementation, so, there is something going on with the way python threads is mixed with c threads. Using gdb I was able to confirm that one of the c threads gets stuck

30 Jun 06:45:18 2020 - initializeReader
30 Jun 06:45:18 2020 - stopping current reader
stopping_read
read_callback_null
stats_callback_null
stopping_read_cs
read_callback
^C
Thread 1 "python3" received signal SIGINT, Interrupt.
__libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
46      ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory.
(gdb) info thread
  Id   Target Id         Frame 
* 1    Thread 0x76fee210 (LWP 3019) "python3" __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  2    Thread 0x769b3460 (LWP 3025) "python3" __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  3    Thread 0x75fff460 (LWP 3026) "python3" __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  4    Thread 0x757fe460 (LWP 3027) "python3" __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
(gdb) bt 
#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
#1  0x76ec3072 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x3f22b4) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#2  __pthread_cond_wait_common (abstime=0x0, mutex=0x3f21e0, cond=0x3f2288) at pthread_cond_wait.c:502
#3  __pthread_cond_wait (cond=0x3f2288, mutex=0x3f21e0) at pthread_cond_wait.c:655
#4  0x76b6757a in TMR_stopReading (reader=0x3f10e0) at tm_reader_async.c:387
#5  0x76b5fab0 in Reader_stop_reading (self=0x3f10d8) at mercury.c:976
#6  0x0009ffb2 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) thread 4
[Switching to thread 4 (Thread 0x757fe460 (LWP 3027))]
#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
46      in ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S
(gdb) bt
#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
#1  0x76ec5194 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=1, futex_word=0x3f21a8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#2  do_futex_wait (sem=sem@entry=0x3f21a8, abstime=0x0) at sem_waitcommon.c:115
#3  0x76ec5274 in __new_sem_wait_slow (sem=0x3f21a8, abstime=0x0) at sem_waitcommon.c:282
#4  0x76b6839e in process_async_response (reader=0x3f10e0) at tm_reader_async.c:977
#5  0x76b689b2 in do_background_reads (arg=0x3f10e0) at tm_reader_async.c:1218

tm_reader_async.c:1218  -> process_async_response(reader);
tm_reader_async.c:977 -> sem_wait(&reader->queue_slots);

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 47 (1 by maintainers)

Commits related to this issue

Most upvoted comments

  • Chance that this happens during development: ~1%
  • Chance that this happens during first client demo: 200%

All right, fix for the local issue (but not the upstream issue) has been merged to master. If you still encounter this bug, I recommend trying the latest master rather than the latest tag release.

On our side it’s because we need to write to the tag also so before writing we need to stop reading.

Interesting. If I run into a crash with the new code, I will let you know.