sdk-php: [Bug] Got the response to undefined request due to the memory leak in the WF worker

Describe the bug

Sometimes when a child worker process throws an exception, the parent worker process throws the following panic error:

PanicError: flush queue: SoftJobError:
	codec_execute:
	sync_worker_exec:
	sync_worker_exec_payload: LogicException: Got the response to undefined request 10389 in /srv/vendor/temporal/sdk/src/Internal/Transport/Client.php:60

and after it:

PanicError: unknown command CommandType: ChildWorkflow, ID: edfb1479-3d88-407e-a428-7e304e0d7bdf, possible causes are nondeterministic workflow definition code or incompatible change in the workflow definition 
Screenshot 2022-05-26 at 20 57 30

Environment/Versions

  • Temporal Version: 1.16.2 and 1.2.0 SDK
  • We use Kubernetes

Additional context

We tried to scale the pods so that can be split in different zones for fault tolerance. Maybe that causing these problems.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 68 (20 by maintainers)

Most upvoted comments

Basically we have internal code to make sure that no floating timers left after leaving awaitWithTimeout. This optimization did not behave as expected, but we properly isolated the issue, so patch is on it’s way.

The leak was in the timer’s cleanup routine where awaitWithTimeout was triggered by the internal timeout. @roxblnfk Will push the PR later (and you’ll be able to review it πŸ˜„ )

@dmitry-pilipenko Thank you. Could you please update RR version? You use an unsupported version (v2.7.4). You may try v2.10.2.

@rustatian RR updates helped me. Your quick response helped me a lot. Thank you for this!

I’ll try to reproduce it with OOM when I get some free time πŸ˜„ πŸ™

Great job guys thank you πŸ˜‰

Hey @Zylius @cv65kr πŸ‘‹πŸ» We’ve found the code, which smells, on it now πŸ‘πŸ»

Right, good point.

I profiled SDK yesterday and issue might be cause by circular references, so PHP GC is not able to clean memory properly. Thanks for your input and help, I really appreciate that πŸ‘ We are on the same page. I will try to help as well and if I discover something I will immediately share with you guys.

We all need some vacation πŸ˜† Sure, I’ll try to run your sample πŸ‘πŸ» Thanks ⚑

@rustatian sorry crazy day image

And I started around 40 workflows on my machine during this time

Hi @rustatian

I am using the v2.3.1 of SDK and PHP 8.1.

I tried today without debug mode as well, and memory is still not released. image

You can tell me how you tested it? Let’s compare our steps.

Hi @rustatian I will try on monday

Hey @cv65kr πŸ‘‹πŸ»

Thanks for the detailed report πŸ‘πŸ». We have a few guesses on how to fix that. @roxblnfk is working on that now.

Ok, thanks. Please keep us updated; if we can reproduce this weird issue, we will fix it ASAP.