workers-sdk: 🐛 BUG: "[ERROR] Error in ProxyController: Error inside ProxyWorker"
Which Cloudflare product(s) does this pertain to?
Wrangler core, Miniflare
What version(s) of the tool(s) are you using?
3.19.0 [Wrangler]
What version of Node are you using?
20.10.0
What operating system are you using?
Linux
Describe the Bug
Repeatedly calling an endpoint calling a Durable Object will result in this error every other request:
✘ [ERROR] Error in ProxyController: Error inside ProxyWorker
{
name: 'Error',
message: 'Network connection lost.',
stack: 'Error: Network connection lost.'
}
Not sure if the cause is actually repeatedly calling a DO. In the DevTools, requests to the DO appears to be all successful.
Downgrading to 3.18.0 fixes this issue, so this is possibly a regression involving the startDevWorker refactor.
Please provide a link to a minimal reproduction
No response
Please provide any relevant error logs
No response
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Reactions: 16
- Comments: 40 (16 by maintainers)
hi @aroman and all, apologies for the delayed action on this issue. our team got pulled into high priority internal work over the last several weeks and we fell behind on our regular
workers-sdkmaintenance. i appreciate you calling out our engagement with the community as a positive – we strive to keep you all informed as much as possible. this is a good reminder for us to continue to communicate out any internal discussion we have about particular issues so you all are always up to date on their status.in terms of concrete next steps, we have prioritized this issue for this week and assigned someone to address it. and while we’re also taking strides to reduce the number of regressions by increasing test coverage, going forward we’ll be prioritizing fixing any regressions that do slip through as quickly as possible – we have also just added a regression label so that items such as these get highlighted, please feel free to use it 😃
thanks for raising this feedback!
@admah even the Counter example in DO docs has this issue, with any wrangler above version 3.18. It does not require any concurrent requests. I can reliably reproduce the issue using hand triggered http requests if they are less than say 3 seconds apart. And the behavior is very consistent, always 1 working request followed by 1 broken request and repeat.
@aroman that is a good callout. We are constantly reviewing our processes to see what we can do to mitigate these types of incidents, because we do understand how disruptive they are.
For this issue (and any others related to previous startDevWorker work) we have prioritized them and are working to have them resolved ASAP.
I found a way to reliably reproduce this on Windows @RamIdeas (and some success on MacOS). I filed https://github.com/cloudflare/workers-sdk/issues/5095 to track separately.
Thanks @matthewjosephtaylor. They unfortunately aren’t consistent for one of my team. They can send 100 requests, 98 are fine, and then 2 of them throw the new 503 introduced in #4867 😔
Confirmed on my
development environmentwhen using CORS but in deployed environment it is working fine.@RamIdeas We actually migrated off Cloudflare Workers not long after this issue, so my memory on this issue may not be the best.
It’s not just noisy log message. Every other HTTP request to the endpoint will error out with that error message.
I tried to reproduce this issue now using a stripped down version of our old code but also couldn’t reproduce it now (repeatedly calling a DO). It’s possible that the root issue lies on other interactions but I couldn’t recall what exactly.
If you’re talking about websocket hibernation API, we don’t use those APIs in our project (we use WebSockets + DOs but not the hibernate APIs), and we are still hit hard by this issue when developing locally and running tests.
Separately, for CF folks (cc: @lrapoport-cf), I’d like to underscore the point @beanow-at-crabnebula made above. At my company, we’ve been unable to upgrade wrangler since early November, due to one regression after another (3 by my count: this one, #4496, and one caused by #4535 I can’t seem to find). Of these, only the latter was documented in release notes.
I appreciate how frequently wrangler releases new versions, but my feeling is that something has got to change process-wise to mitigate the frequency and severity of breaking regressions — or at least document them in release notes so folks are aware of the tradeoffs when upgrading. I appreciate how active and interactive with customers your team is on the CF github repos — that level of communication goes a long way.
Work-around:
I’m using DurableObjects with WebSockets.
As long as I have the client continually sends a ‘ping’ message across the socket every 5 seconds I’m no longer experiencing the issue.
I STRONGLY suspect the issue is with wrangler hybernating the worker. If I never let it rest, I don’t get the error.
Just adding a note that we now have critical CVEs against the only workaround: downgrading to wrangler 3.18 https://github.com/cloudflare/workers-sdk/security
Upgrading to latest wrangler also caused Jest to find leaking request handles. This makes the
unstable_devfor integration tests: as documented. Very brittle.I’ve started seeing test flakes which are based on lots of
unstable_devstarts. Without DO bindings (1 KV and 1 R2 binding though).Reverting to
3.18.0indeed solves the issue.Node 18, Linux, if that matters.