opentelemetry-php: PHP-FPM child segfault. Possibly on die or exit func calls

Environment

Our environment runs on Kubernetes in a FPM only pod. This pod’s base is created as such:

ARG PHP_VERSION=8.2
FROM php:${PHP_VERSION}-fpm-alpine as build

ADD https://github.com/mlocati/docker-php-extension-installer/releases/latest/download/install-php-extensions /usr/local/bin/

RUN chmod +x /usr/local/bin/install-php-extensions && \
    install-php-extensions open-telemetry/opentelemetry-php-instrumentation@1.0.0beta2
[...]

Some PHP version info

$ php -v
PHP 8.2.4 (cli) (built: Mar 29 2023 22:47:02) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.2.4, Copyright (c) Zend Technologies
    with Zend OPcache v8.2.4, Copyright (c), by Zend Technologies

Some OTEL info

$ php -i | grep otel
/usr/local/etc/php/conf.d/docker-php-ext-otel_instrumentation.ini,
otel_instrumentation
otel_instrumentation support => enabled

$ php -i | grep proto 
/usr/local/etc/php/conf.d/docker-php-ext-protobuf.ini,
protobuf
protobuf.keep_descriptor_pool_after_request => 0 => 0

Steps to reproduce It’s a bit difficult I think in other cases and was not able to create a simple set-up to demonstrate it, yet. It appears FPM segfaults with the following logs in normal:

[30-Mar-2023 11:48:53] WARNING: [pool www] child 66 exited on signal 11 (SIGSEGV - core dumped) after 7644.186788 seconds from start
[30-Mar-2023 11:48:53] NOTICE: [pool www] child 1131 started

With debug logging, we’re seeing this:

[....]
":"http://redacted.example/bla/","method":"GET"} {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.889846] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:22.897970+00:00] app.DEBUG: [Redacted in app monolog log] [] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.898127] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:22.899596+00:00] app.INFO: [Redacted in app monolog log] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.899777] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:22.938389+00:00] app.INFO: [Redacted in app monolog log] {"tag":"JsonWspClient"} {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.938656] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:22.938698+00:00] app.DEBUG: [Redacted in app monolog log] [] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.938815] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:22.939777+00:00] app.INFO: [Redacted in app monolog log] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.940063] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:23.003724+00:00] app.INFO: [Redacted in app monolog log] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:23.003952] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:23.005059+00:00] app.DEBUG: [Redacted in app monolog log] [] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:23.005143] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[30-Mar-2023 12:59:23.134352] DEBUG: pid 1, fpm_pctl_perform_idle_server_maintenance(), line 398: [pool www] currently 1 active children, 31 spare children, 32 running children. Spawning rate 1
[30-Mar-2023 12:59:23.326724] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[30-Mar-2023 12:59:23.326780] DEBUG: pid 1, fpm_got_signal(), line 82: received SIGCHLD
[30-Mar-2023 12:59:23.326789] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 2 events
[30-Mar-2023 12:59:23.326850] WARNING: pid 1, fpm_children_bury(), line 258: [pool www] child 71 exited on signal 11 (SIGSEGV - core dumped) after 78.266474 seconds from start
[30-Mar-2023 12:59:23.326887] DEBUG: pid 1, fpm_children_make(), line 407: blocking signals before child birth
[30-Mar-2023 12:59:23.327734] DEBUG: pid 1, fpm_children_make(), line 431: unblocking signals, child born
[30-Mar-2023 12:59:23.327827] NOTICE: pid 1, fpm_children_make(), line 437: [pool www] child 87 started
10.42.4.24 - - [30/Mar/2023:12:59:23 +0000] "GET /bla/ HTTP/1.1" 502 166 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.3 Safari/605.1.15"
[....]

From as far as I was able to trace it this happens when we call a redirect method in our in-house framework. This method does some checks, sets the redirect header and then calls exit(). Commenting the exit “fixes” the segfault. Adding die() statements around the redirect method call also yields in a segfault. HOWEVER, adding a die() in our index after vendor autoload include but before returning symfony runtime config it does work.

What is the expected behavior? Cleanly stop code execution on die() or exit() call.

What is the actual behavior? Segfault.

Additional context While the logs from the FPM process state there was a core dump created, we could not find anything in the pod or on the worker node. We have some logs from the process while running strace, but I’m not that keen on sharing those publicly, unless it doesn’t contain app specific stuffs

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 35 (27 by maintainers)

Most upvoted comments

I believe it works fine now with beta 6, thanks. I’ll keep it enabled on our internal testing environment so we can see what happens in the coming days/weeks.

To be sure, @ahkui are you able to confirm whether beta 6 works for you too or not?

@xvilo - beta6 just released

@ahkui are you able to create a docker container which will cause segfault? That will help pdelewski to debug the issue - I can’t help, it’s beyond my skills 😦

@xvilo post function has incorrect parameters should be

    post: static function ($object, array $params, mixed $ret, ?\Throwable $exception ) {
        echo 'post' . PHP_EOL;
    }

Anyway seems that calling die causes segfault. I reproduced it on debug build

Program received signal SIGSEGV, Segmentation fault.
0x0000555555c68ef5 in _build_trace_args (arg=0x7ffff585e6a8, str=0x7fffffffa450) at /home/vagrant/php-src/Zend/zend_exceptions.c:515
515					zend_string *class_name = Z_OBJ_HANDLER_P(arg, get_class_name)(Z_OBJ_P(arg));
(gdb) bt
#0  0x0000555555c68ef5 in _build_trace_args (arg=0x7ffff585e6a8, str=0x7fffffffa450) at /home/vagrant/php-src/Zend/zend_exceptions.c:515
open-telemetry/opentelemetry-php#1  0x0000555555c69482 in _build_trace_string (str=0x7fffffffa450, ht=0x7ffff585a960, num=0) at /home/vagrant/php-src/Zend/zend_exceptions.c:574
open-telemetry/opentelemetry-php#2  0x0000555555c695ed in zend_trace_to_string (trace=0x7ffff585a9c0, include_main=true) at /home/vagrant/php-src/Zend/zend_exceptions.c:600
open-telemetry/opentelemetry-php#3  0x0000555555c697a4 in zim_Exception_getTraceAsString (execute_data=0x7ffff5814070, return_value=0x7fffffffa6f0) at /home/vagrant/php-src/Zend/zend_exceptions.c:629
open-telemetry/opentelemetry-php#4  0x0000555555b8e40c in zend_call_function (fci=0x7fffffffa720, fci_cache=0x7fffffffa610) at /home/vagrant/php-src/Zend/zend_execute_API.c:920
open-telemetry/opentelemetry-php#5  0x0000555555c69b04 in zim_Exception___toString (execute_data=0x7ffff5814020, return_value=0x7fffffffaae0) at /home/vagrant/php-src/Zend/zend_exceptions.c:674
open-telemetry/opentelemetry-php#6  0x0000555555b8e40c in zend_call_function (fci=0x7fffffffa980, fci_cache=0x7fffffffa960) at /home/vagrant/php-src/Zend/zend_execute_API.c:920
open-telemetry/opentelemetry-php#7  0x0000555555b8e79b in zend_call_known_function (fn=0x555556a148f0, object=0x7ffff58740c0, called_scope=0x555556a16700, retval_ptr=0x7fffffffaae0, param_count=0, params=0x0, named_params=0x0) at /home/vagrant/php-src/Zend/zend_execute_API.c:997
open-telemetry/opentelemetry-php#8  0x0000555555c661f5 in zend_call_known_instance_method (fn=0x555556a148f0, object=0x7ffff58740c0, retval_ptr=0x7fffffffaae0, param_count=0, params=0x0) at /home/vagrant/php-src/Zend/zend_API.h:664
open-telemetry/opentelemetry-php#9  0x0000555555c6622f in zend_call_known_instance_method_with_0_params (fn=0x555556a148f0, object=0x7ffff58740c0, retval_ptr=0x7fffffffaae0) at /home/vagrant/php-src/Zend/zend_API.h:670
open-telemetry/opentelemetry-php#10 0x0000555555c6acfc in zend_exception_error (ex=0x7ffff58740c0, severity=1) at /home/vagrant/php-src/Zend/zend_exceptions.c:907
open-telemetry/opentelemetry-php#11 0x0000555555ba7085 in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /home/vagrant/php-src/Zend/zend.c:1797
open-telemetry/opentelemetry-php#12 0x0000555555b03de1 in php_execute_script (primary_file=0x7fffffffd0a0) at /home/vagrant/php-src/main/main.c:2539
open-telemetry/opentelemetry-php#13 0x0000555555d1bcff in do_cli (argc=2, argv=0x55555698ab10) at /home/vagrant/php-src/sapi/cli/php_cli.c:965
open-telemetry/opentelemetry-php#14 0x0000555555d1ce07 in main (argc=2, argv=0x55555698ab10) at /home/vagrant/php-src/sapi/cli/php_cli.c:1367