opentelemetry-php: PHP-FPM child segfault. Possibly on die or exit func calls
Environment
Our environment runs on Kubernetes in a FPM only pod. This pod’s base is created as such:
ARG PHP_VERSION=8.2
FROM php:${PHP_VERSION}-fpm-alpine as build
ADD https://github.com/mlocati/docker-php-extension-installer/releases/latest/download/install-php-extensions /usr/local/bin/
RUN chmod +x /usr/local/bin/install-php-extensions && \
install-php-extensions open-telemetry/opentelemetry-php-instrumentation@1.0.0beta2
[...]
Some PHP version info
$ php -v
PHP 8.2.4 (cli) (built: Mar 29 2023 22:47:02) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.2.4, Copyright (c) Zend Technologies
with Zend OPcache v8.2.4, Copyright (c), by Zend Technologies
Some OTEL info
$ php -i | grep otel
/usr/local/etc/php/conf.d/docker-php-ext-otel_instrumentation.ini,
otel_instrumentation
otel_instrumentation support => enabled
$ php -i | grep proto
/usr/local/etc/php/conf.d/docker-php-ext-protobuf.ini,
protobuf
protobuf.keep_descriptor_pool_after_request => 0 => 0
Steps to reproduce It’s a bit difficult I think in other cases and was not able to create a simple set-up to demonstrate it, yet. It appears FPM segfaults with the following logs in normal:
[30-Mar-2023 11:48:53] WARNING: [pool www] child 66 exited on signal 11 (SIGSEGV - core dumped) after 7644.186788 seconds from start
[30-Mar-2023 11:48:53] NOTICE: [pool www] child 1131 started
With debug logging, we’re seeing this:
[....]
":"http://redacted.example/bla/","method":"GET"} {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.889846] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:22.897970+00:00] app.DEBUG: [Redacted in app monolog log] [] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.898127] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:22.899596+00:00] app.INFO: [Redacted in app monolog log] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.899777] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:22.938389+00:00] app.INFO: [Redacted in app monolog log] {"tag":"JsonWspClient"} {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.938656] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:22.938698+00:00] app.DEBUG: [Redacted in app monolog log] [] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.938815] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:22.939777+00:00] app.INFO: [Redacted in app monolog log] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:22.940063] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:23.003724+00:00] app.INFO: [Redacted in app monolog log] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:23.003952] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[2023-03-30T12:59:23.005059+00:00] app.DEBUG: [Redacted in app monolog log] [] {"requestId":"36620a09-b398-4914-b86e-646dc6219d06"}
[30-Mar-2023 12:59:23.005143] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[30-Mar-2023 12:59:23.134352] DEBUG: pid 1, fpm_pctl_perform_idle_server_maintenance(), line 398: [pool www] currently 1 active children, 31 spare children, 32 running children. Spawning rate 1
[30-Mar-2023 12:59:23.326724] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 1 events
[30-Mar-2023 12:59:23.326780] DEBUG: pid 1, fpm_got_signal(), line 82: received SIGCHLD
[30-Mar-2023 12:59:23.326789] DEBUG: pid 1, fpm_event_loop(), line 435: event module triggered 2 events
[30-Mar-2023 12:59:23.326850] WARNING: pid 1, fpm_children_bury(), line 258: [pool www] child 71 exited on signal 11 (SIGSEGV - core dumped) after 78.266474 seconds from start
[30-Mar-2023 12:59:23.326887] DEBUG: pid 1, fpm_children_make(), line 407: blocking signals before child birth
[30-Mar-2023 12:59:23.327734] DEBUG: pid 1, fpm_children_make(), line 431: unblocking signals, child born
[30-Mar-2023 12:59:23.327827] NOTICE: pid 1, fpm_children_make(), line 437: [pool www] child 87 started
10.42.4.24 - - [30/Mar/2023:12:59:23 +0000] "GET /bla/ HTTP/1.1" 502 166 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.3 Safari/605.1.15"
[....]
From as far as I was able to trace it this happens when we call a redirect
method in our in-house framework. This method does some checks, sets the redirect header and then calls exit()
. Commenting the exit “fixes” the segfault. Adding die()
statements around the redirect
method call also yields in a segfault. HOWEVER, adding a die()
in our index after vendor autoload include but before returning symfony runtime config it does work.
What is the expected behavior?
Cleanly stop code execution on die()
or exit()
call.
What is the actual behavior? Segfault.
Additional context
While the logs from the FPM process state there was a core dump created, we could not find anything in the pod or on the worker node. We have some logs from the process while running strace
, but I’m not that keen on sharing those publicly, unless it doesn’t contain app specific stuffs
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 35 (27 by maintainers)
I believe it works fine now with beta 6, thanks. I’ll keep it enabled on our internal testing environment so we can see what happens in the coming days/weeks.
To be sure, @ahkui are you able to confirm whether beta 6 works for you too or not?
@xvilo - beta6 just released
@ahkui are you able to create a docker container which will cause segfault? That will help pdelewski to debug the issue - I can’t help, it’s beyond my skills 😦
@xvilo
post
function has incorrect parameters should beAnyway seems that calling die causes segfault. I reproduced it on debug build