apm-agent-php: php apm 1.6.1 segfaults on a zend error/php warning
Describe the bug When inserting a continue into a for loop APM segfaults. PHP generally shows a warning, When I ran gdb it seems that zend throws and error. I really don’t think that apm should die on a warning without any useful information being exposed.
To Reproduce The below code triggers the segfault.
<?php
$alist = ['apple', 'banana', 'lemon', 'lime'];
print('Starting'. "\n");
foreach ($alist as $item) {
switch ($item) {
case 'apple':
print($item . "\n");
break;
case 'banana':
print("more\n");
continue;
default:
print("default\n");
}
}
print("Ended\n");
Without APM
PHP Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /home/wayne/test.php on line 14
With APM
Segmentation fault
Expected behavior
The php warning should be relayed to the logs somewhere.
GDB initial snippet
Reading symbols from /usr/sbin/php-fpm7.4...
Reading symbols from /usr/lib/debug/.build-id/4c/7489f921fc631b72accb27206c39024829c946.debug...
[New LWP 2076609]
[New LWP 2076638]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `php-fpm: pool www '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f19d9243a1e in __vfscanf_internal (s=s@entry=0x7fffc73555d0, format=format@entry=0x7f19d93958f7 "%hu%n:%hu%n:%hu%n", argptr=argptr@entry=0x7fffc73555b8, mode_flags=mode_flags@entry=2)
at vfscanf-internal.c:278
278 vfscanf-internal.c: No such file or directory.
[Current thread is 1 (Thread 0x7f19d717ba00 (LWP 2076609))]
GDB Offending lines
#42545 0x00007f19d3fc9836 in callPhpFunction () from /opt/elastic/apm-agent-php/extensions/elastic_apm-20190902.so
#42546 0x00007f19d3fc9e26 in callPhpFunctionRetVoid () from /opt/elastic/apm-agent-php/extensions/elastic_apm-20190902.so
#42547 0x00007f19d3fc772a in onPhpErrorToTracerPhpPart () from /opt/elastic/apm-agent-php/extensions/elastic_apm-20190902.so
#42548 0x00007f19d3fb60e7 in elasticApmZendErrorCallbackImpl () from /opt/elastic/apm-agent-php/extensions/elastic_apm-20190902.so
#42549 0x00007f19d3fb62de in elasticApmZendErrorCallback () from /opt/elastic/apm-agent-php/extensions/elastic_apm-20190902.so
#42550 0x000055a377dcf7b3 in zend_error_va_list (type=type@entry=2, error_filename=0x7f19c3eb3108 "/var/local/sac/app/Models/MemberBasket.php", error_lineno=292, format=format@entry=0x55a378072440 "\"continue\" targeting switch is equivalent to \"break\". Did you mean to use \"continue %ld\"?", args=args@entry=0x7fffc7b4faf0) at ./Zend/zend.c:1319
#42551 0x000055a377dcfd97 in zend_error (type=type@entry=2, format=format@entry=0x55a378072440 "\"continue\" targeting switch is equivalent to \"break\". Did you mean to use \"continue %ld\"?") at ./Zend/zend.c:1480
PHP FPM logs
[01-Dec-2022 10:33:13] WARNING: [pool www] child 2094944 exited on signal 11 (SIGSEGV - core dumped) after 33252.924112 seconds from start
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (6 by maintainers)
@waynegemmell We plan to release 1.7.1 with the fix very ASAP.
Hi @xyu thank you for the extra details and help in investigating this issue. We are currently investigating on our side and should have an update soon. Thanks for your patience!
Fixed by #834
@xyu Thank you very much for analysis. You are absolutely correct and indeed the root cause seems to be the timing - PHP error occurs at a point when PHP part of the agent cannot be invoked. We encountered similar situation in #737 and we implemented a fix for it (#797) by reversing control flow - instead of having C part of the agent push notification about the error to PHP part we buffer the notification at C part and have PHP part poll for it at key points (begin/end of span/transaction, etc.). This fix was already part of 1.7.0 release but its implementation had some bugs which we are trying to fix in https://github.com/elastic/apm-agent-php/pull/834. We have the fix candidate and I tested it on the scenario in this issue as well and it seems to fix it.
@waynegemmell, @xyu Could you please try the fix candidate and let us know if it fixes the issue for you?