grpc: Ruby gRPC processes get stuck during termination in gpr_cv_wait()
We are running a Rails application with multiple unicorn processes and observed that after upgrading from gRPC 1.1.2 to 1.2.2, many processes were unable to terminate. It appears to be a deadlock condition in gpr_cv_wait()
:
Thread 1 (Thread 0x7f8ce75e5780 (LWP 20468)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007f8cca856fa2 in gpr_cv_wait () from /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/grpc-1.2.2-x86_64-linux/src/ruby/lib/grpc/2.3/grpc_c.so
#2 0x00007f8cca825272 in ?? () from /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/grpc-1.2.2-x86_64-linux/src/ruby/lib/grpc/2.3/grpc_c.so
#3 0x00007f8cca8252ec in ?? () from /opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/grpc-1.2.2-x86_64-linux/src/ruby/lib/grpc/2.3/grpc_c.so
#4 0x00007f8ce6d3ea91 in run_final (objspace=0x7f8ce5a16000, zombie=140242726493960) at gc.c:2675
#5 finalize_list (objspace=objspace@entry=0x7f8ce5a16000, zombie=140242726493960) at gc.c:2691
#6 0x00007f8ce6d4a807 in rb_objspace_call_finalizer (objspace=0x7f8ce5a16000) at gc.c:2839
#7 rb_gc_call_finalizer_at_exit () at gc.c:2764
#8 0x00007f8ce6d29a8f in ruby_finalize_1 () at eval.c:131
#9 ruby_cleanup (ex=<optimized out>) at eval.c:222
#10 0x00007f8ce6d29d05 in ruby_run_node (n=0x7f8ce5069950) at eval.c:302
#11 0x000000000040086b in main (argc=10, argv=0x7ffde01903b8) at main.c:36
(gdb)
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 17 (9 by maintainers)
@stanhu 1.2.2 introduced a background thread that’s involved with the life cycles of grpc channel objects (used by stubs), it’s currently getting started upon a
require grpc
. It looks like what’s breaking down right now is this background thread is starting before the fork, but then it’s not present in the child post-fork.so e.g., while below snippet would previously terminate, it’s now hanging:
is now hanging
fwiw, we’re experiencing similar problems with 1.4.0, however with the PHP extension running under PHP-FPM. Seems to be getting stuck in the exact same place as the original report:
(PHP-FPM being a forking process manager for PHP)
The latest 1.3.4 release fixes the issue in which the finalizer of a “channel object” created after a fork hangs.
The general problems with forking mentioned here should be due to general lack of forking support. Closing this now, as there’s more forking support discussion in https://github.com/grpc/grpc/issues/8798
@apolcyn Would it make sense, in addition to the delayed initialization of the background thread introduced in #10670, to also have an explicit way to perform this initialization?
It’s nice that the library will be doing the right thing automatically most of the time, but without an explicit ‘GRPC.after_fork!’ method call it is harder to defend against this mistake (failing to initializing a background thread in a fork) sneaking back into the code.
For example, in GitLab we currently have a way of doing things where we initialize certain global variables while loading the application, so before forking. This is because we use a dual concurrency model: pre-forking for web requests, and multi-threading for background jobs. Initializing globals while loading the app is a trick we use for the multi-threaded mode.
This means we also call
GRPC::Core::Channel.new
before forking. Does that ruin the fix in #10670? If we had an explicitGRPC.after_fork!
(or something like it) I would not have to ask the question.