channels: ASGIRef 3.4.1 + Channels 3.0.3 causes non-deterministic 500 errors serving static files
So this was the very first scenario in which the Single thread executor error was found and that lead to me opening django/asgiref#275
While trying to get a simple repro-case for it, we figured out a way to trigger an error related to it in a very simple way and this was fixed with https://github.com/django/asgiref/releases/tag/3.4.1
But testing the new 3.4.1 version against our code-base still yielded the same 500 errors while serving static files (at least) in the dev environment.
I’ve updated https://github.com/rdmrocha/asgiref-thread-bug with this new repro-case, by loading a crapload of JS files
(1500) but that can be changed in the views.py
file.
It doesn’t ALWAYS happen (so you might need a hard-refresh or two) but when it does, you’ll be greeted with something like this:
I believe this is still related https://github.com/django/asgiref/commit/13d0b82a505a753ef116e11b62a6dfcae6a80987 as reverting to v3.3.4 via requirements.txt makes the error go away.
Looking at the offending code inside channels/http.py
it looks like this might be a thread exhaustion issue but this is pure speculation.
since the handle is decorated as sync_to_async:
This is forcing the
send
to become sync and we’re waiting on it like this: await self.handle(scope, async_to_sync(send), body_stream)
.
If there’s no more threads available, I speculate that they might end up in a deadlock waiting for the unwrap of this await async_to_sync(async_to_sync) call, eventually triggering the protection introduced in https://github.com/django/asgiref/commit/13d0b82a505a753ef116e11b62a6dfcae6a80987
But take this last part with a grain of salt as this is pure speculation without diving into the code and debugging it. Hope it helps
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 10
- Comments: 34 (8 by maintainers)
Commits related to this issue
- Solve static files deadlock bug for django 4.0 (#1722) — committed to sterliakov/channels by sterliakov 2 years ago
- Solve static files deadlock bug for django 3.0+ (#1722) — committed to sterliakov/channels by sterliakov 2 years ago
Having ran into this issue and dug into its root cause, I think I can provide some insight. As I understand it, the deadlock detection in asgiref works like this:
The issue here is that contexts may be re-used by daphne / twisted in the case of persistent connections. When a second HTTP request is sent on the same TCP connection, twisted re-uses the same context from the existing connection instead of creating a new one.
So in twisted, context variables are per connection not per http request. This subtle difference then causes a problem due to how the
StaticFilesHandler
, built on the channelsAsgiHandler
works. It usesasync_to_sync(send)
to pass the send method intoself.handle()
, which is itself decorated withsync_to_async
.So what I think is happening is this sequence of events:
send()
(but the sync thread does not yet exit!)ASGIHandler
andSyncToAsync
try to use the single thread executor but it’s busy, so it errorsIf step 6 blocked instead of erroring, all would be fine, since the sync thread would have finished anyways. I don’t think there’s a deadlock here, and I don’t thing the deadlock detection code in asgiref is working properly.
OK, I’ve started work on what will be v4.0 #1890 moves to use
django.contrib.staticfiles
, and should address this.If anyone wants to give that a run, or follow
main
over the next few weeks to help spot any issues, that would be great.Once I’ve made a bit more progress, I’ll open a tracking issue for v4.0 as well.
@kezabelle Yes. Using Django’s implementation and removing the Channels one is the way to go.
Channel’s
AsgiHandler
predates Django’s ASGI support from Django 3.0. From then on, providing such is not part of Channel’s job. Much better is to defer to the Django implementation, and channels can focus on the Consumers and Chanel Layer, and other bits, that aren’t going to make their way into Django.The proximate reason that’s not been done was the need to maintain Django 2.2 support, but we can let that go now. (See also #1795 — time for a major version bump, again.)
We need to keep
runserver
, since Django doesn’t have an ASGI simple server available (and likely never will for websockets), but I’d happily take PRs droppingAsgiHandler
andStaticWrapper
and such.In the meantime (responding to various points above):
/channels/…
for example — you can Google the rest.)Thanks for confirming @JulienPalard. I’m pulling together the releases now, so a few days for the final versions to be live.
As @fmgoncalves has mentioned, one way is to alter how the files are served, but I have found a little more reliable patch that I have implemented in my Django Channels Setup based on the information provided by their post.
It seems that the change of
thread_sensitive=False
to default toTrue
is causing these deadlock detection messages. As far as I can see, Django Channels doesn’t seem to mind the occurrences of the deadlocks.That being said, I felt it would be safe to monkey patch the
sync_to_async
function for my Django installation.project_app/moneky_patches.py
Then you just overwrite the existing instance of asgiref’s sync_to_async method with our patched wrapper that enforces thread_sensitive=False
project_app/__init__.py
This is make channels, and all of Django run in a insensitive manner like it did before the ASGIRef update.
The underlying problem is channels’ StaticFilesWrapper. This implementation is used to serve static files on the development server (runserver). However, it’s use of
sync_to_async
andasync_to_sync
is prone to deadlocks on asgiref. One fix would be to passthread_sensitive=False
to the channels’ asgiref handler’s decorator, although, not being intimately familiar with channels, I can’t be sure there aren’t undesirable side effects (my tests did not hit any, though).Fortunately, there is a workaround that doesn’t involve patching channels:
django.contrib.staticfiles
from the installed apps inDEBUG
(runserver) runs. This confuses channels’ runserver and keeps it from short-circuiting the request flow through StaticFilesWrapper. In yoursettings.py
urls.py
And you’re done. Local tests indicate that the problems with concurrent requests for static files are gone and all works as intended.
My two cents here!
I really appreciate everyone who contributed to Django, ASGI and Channels, but this needs very serous investigation. We all know that static files must be served separately and we do it in production, but that’s not about it.
Essentially it is no longer capable of handling the same amount of load as previous versions of channels and asgi, therefore whoever implemented these updates must go back and make sure, that we do not degrade Django-ASGI-Channels stack performance, otherwise all the great things that were introduced with the updates, becomes pointless, even harmful and can cause a lot of troubles for many.
For example I see this exact behavior in a fairly standard django admin, where admin tries to load a few additional script and style files, but it seems like dev server is unable to handle it. Feels like it’s unable to handle more than ~20 concurrent requests at the time to /static/ endpoint.
We’re consistently seeing this in development with runserver often enough for it to be a major inconvenience since we upgraded channels dependencies:
yeah, I realize that’s a bold claim to make without a minimal test case 😀 I’ll see if I can get the time to do that soon
Hi,
What is the recommended course of action regarding this problem? I followed the advice here and downgraded asgiref from 3.4.1 to 3.3.4 and the problem disappeared but I was wondering whether this is the right way to go.
Thanks.
Btw if anyone gets this error during selenium tests, you can turn off static file serving. We use Whitenoise anyway so disabling serving static files fixed it for us:
(ps tyvm for Channels, it has made our lives much easier!)
@carltongibson @andrewgodwin Thank you for your being courteous and your efforts on the project in general!
Fair warning, I don’t use channels so I could be entirely barking up the wrong tree, but here I am anyway, pointing things out (and teaching grandmother to suck eggs, possibly) in case they are relevant in sparking further discussion of solutions.
From what I can see of the
git blame
for the channelsrunserver
andstaticfiles
bits linked above by @fmgoncalves, predominantly the commits are from before Django got it’s ownASGIStaticFilesHandler
by mixing together theStaticFilesHandlerMixin
andASGIHandler
classes in django/django@a415ce70bef6d91036b00dd2c8544aed7aeeaaed … I wondering if perhaps wholly subsuming that would affect things.i.e.
Command.get_application
could returnASGIStaticFilesHandler(get_default_application() ... )
whereASGIStaticFilesHandler
is either the Django provided one or if not appropriate (I dunno the implementation), a mixed together version based onchannels.http.AsgiHandler
anddjango.contrib.staticfiles.handlers.StaticFilesHandlerMixin
Apologies if I’m just muddying things further or not proving useful…
Thanks for the feedback. Our static files in production are being correctly served by S3. The issue did not trigger in production but we saw it in dev environment directly tied to an analogous example as the provided one.
I understand that staticfile serving is a convenience and is not representative of how production is working. I’m just afraid that this specific behavior may be triggered in other scenarios. And investigating this already lead to the ASGIRef 3.4.1 so I still believe we are on to something.