sdk-php: [Bug] Got the response to undefined request due to the memory leak in the WF worker
Describe the bug
Sometimes when a child worker process throws an exception, the parent worker process throws the following panic error:
PanicError: flush queue: SoftJobError:
codec_execute:
sync_worker_exec:
sync_worker_exec_payload: LogicException: Got the response to undefined request 10389 in /srv/vendor/temporal/sdk/src/Internal/Transport/Client.php:60
and after it:
PanicError: unknown command CommandType: ChildWorkflow, ID: edfb1479-3d88-407e-a428-7e304e0d7bdf, possible causes are nondeterministic workflow definition code or incompatible change in the workflow definition
Environment/Versions
- Temporal Version: 1.16.2 and 1.2.0 SDK
- We use Kubernetes
Additional context
We tried to scale the pods so that can be split in different zones for fault tolerance. Maybe that causing these problems.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 68 (20 by maintainers)
Basically we have internal code to make sure that no floating timers left after leaving awaitWithTimeout. This optimization did not behave as expected, but we properly isolated the issue, so patch is on itβs way.
The leak was in the timerβs cleanup routine where
awaitWithTimeoutwas triggered by the internal timeout. @roxblnfk Will push the PR later (and youβll be able to review it π )@rustatian RR updates helped me. Your quick response helped me a lot. Thank you for this!
Iβll try to reproduce it with OOM when I get some free time π π
Great job guys thank you π
Hey @Zylius @cv65kr ππ» Weβve found the code, which smells, on it now ππ»
Right, good point.
I profiled SDK yesterday and issue might be cause by circular references, so PHP GC is not able to clean memory properly. Thanks for your input and help, I really appreciate that π We are on the same page. I will try to help as well and if I discover something I will immediately share with you guys.
We all need some vacation π Sure, Iβll try to run your sample ππ» Thanks β‘
@rustatian sorry crazy day
And I started around 40 workflows on my machine during this time
Hi @rustatian
I am using the v2.3.1 of SDK and PHP 8.1.
I tried today without debug mode as well, and memory is still not released.
You can tell me how you tested it? Letβs compare our steps.
Hi @rustatian I will try on monday
Hey @cv65kr ππ»
Thanks for the detailed report ππ». We have a few guesses on how to fix that. @roxblnfk is working on that now.
Ok, thanks. Please keep us updated; if we can reproduce this weird issue, we will fix it ASAP.