grpc: 1.4: PHP gRPC extension causes fpm processes to hang, gets stuck in shutdown

What version of gRPC and what language are you using?

gRPC 1.4.0, PHP 7.0.18.

What operating system (Linux, Windows, …) and version?

Linux 4.9.0-0.bpo.2-amd64 #1 SMP Debian 4.9.18-1~bpo8+1 (2017-04-10) x86_64 GNU/Linux

What runtime / compiler are you using (e.g. python version or version of gcc)

gcc version 4.9.2 (Debian 4.9.2-10) PHP 7.0.18-1~dotdeb+8.1 (cli) ( NTS )

What did you do?

Just installed PHP gRPC extension 1.4.0, proceeded as normal without utilizing any gRPC PHP code. Then ran on our integration server.

What did you expect to see?

PHP/FPM worker should normally shutdown.

What did you see instead?

FPM workers fail to shutdown properly, which eventually caused the FPM pool to max out its children, and eventually crash:

(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007fa863ea1b9a in gpr_cv_wait () from /usr/lib/php/20151012/grpc.so
#2  0x00007fa863ecf0d3 in stop_threads () from /usr/lib/php/20151012/grpc.so
#3  0x00007fa863ecf132 in grpc_timer_manager_shutdown () from /usr/lib/php/20151012/grpc.so
#4  0x00007fa863ec1e98 in grpc_iomgr_shutdown () from /usr/lib/php/20151012/grpc.so
#5  0x00007fa863ea340b in grpc_shutdown () from /usr/lib/php/20151012/grpc.so
#6  0x00007fa863e9a3a7 in zm_shutdown_grpc () from /usr/lib/php/20151012/grpc.so
#7  0x000056471ab96207 in module_destructor (module=module@entry=0x56471c4110e0) at /usr/src/builddir/Zend/zend_API.c:2503
#8  0x000056471ab8e2cc in module_destructor_zval (zv=<optimized out>) at /usr/src/builddir/Zend/zend.c:620
#9  0x000056471aba1659 in _zend_hash_del_el_ex (prev=<optimized out>, p=<optimized out>, idx=<optimized out>, ht=<optimized out>) at /usr/src/builddir/Zend/zend_hash.c:1026
#10 _zend_hash_del_el (p=0x56471c4af220, idx=28, ht=0x56471b039d60 <module_registry>) at /usr/src/builddir/Zend/zend_hash.c:1050
#11 zend_hash_graceful_reverse_destroy (ht=ht@entry=0x56471b039d60 <module_registry>) at /usr/src/builddir/Zend/zend_hash.c:1506
#12 0x000056471ab9462c in zend_destroy_modules () at /usr/src/builddir/Zend/zend_API.c:1982
#13 0x000056471ab8f365 in zend_shutdown () at /usr/src/builddir/Zend/zend.c:856
#14 0x000056471ab2e79b in php_module_shutdown () at /usr/src/builddir/main/main.c:2360
#15 0x000056471aa13c85 in main (argc=475908082, argv=0x56471c5dc799) at /usr/src/builddir/sapi/fpm/fpm/fpm_main.c:2021

Anything else we should know about your project / environment?

No actual PHP code is being used; this is just with the extension loaded. Opcache is enabled.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 38 (22 by maintainers)

Most upvoted comments

Summary of findings:

  • I traced it down to the fact that the grpc extension is loaded at the php-fpm parent process. In the PHP MINIT function, we call grpc_init. At 1.4 or later, the gRPC C core library introduced a new thread to manage timer related stuff. That thread was not forked into the php-fpm child processes.
  • So when the child php-fpm process is shutdown (i.e. when the PHP MSHUTDOWN function is called to shutdown each extension), the grpc_shutdown function hangs because the underlying grpc_timer_manager_shutdown cannot find the timer manager thread, which was at the parent process.
  • Unfortunately, grpc has not officially supported fork. Previously things work at version 1.3.2 or before was because it so happened that, if fork happens after grpc_init, and before any grpc channel is created, it will work but just accidentally. With this new timer manager change in 1.4, it will not work if fork happens after grpc_init.
  • The short term workaround is to use version 1.3.2. If there’s any critical bug fixes, we can backport it there.
  • There is ongoing work in grpc core to support forking properly, but that will be 1 to 2 releases out.
  • This is another really ugly workaround is to simply not call grpc_shutdown at the PHP_MSHUTDOWN_FUNCTION. It may be acceptable if the child process is going away anyway but I don’t know the memory leak implication if we do that. Plus, the timer manager thread will still be at the parent so I don’t know whether deadline, cancel related functionality will still work.
  • This is the same issue as #11506
  • Same issue for other languages in #7951

Questions for community:

  • Does anybody know if there’s a way for the PHP extension to call a function after php-fpm forks the child process? If that’s possible, I can make the change in our gRPC PHP extension to only call grpc_init at that hook, rather that at the PHP_MINIT_FUNCTION. I think RINIT is called per request which is too expensive. I tried GINIT but it doesn’t seem to be called after forking.
  • Alternatively, in this particular setup with PHP-FPM, is there a way for us to not fork child processes after the extension is loaded? Is there a way to load an extension only after the child process is forked?