runtime: Reduce the impact of blocking DNS calls on Unix
After helping a customer look at a thread pool starvation case on linux on .NET Core 3.1 I ended up here. After doing some research and with some discussion on Twitter, it turns out that getaddrinfo_a
uses an internal thread pool and blocks on getaddrinfo
and isn’t doing any async IO. This change is an improvement over what we had before because our threadpool doesn’t grow but I’m not sure this change is a net positive in the long run. The thread pool limits are controlled by compile time constants in glibc (essentially, another library is doing async over sync for us on a less controllable threadpool…).
I wonder if we’re better off controlling this blocking code and maybe it should be possible to turn this off with a configuration switch.
The other improvement I was thinking about was only allowing one pending request to a specific host name concurrently. That would improve situations where DNS is slow and new blocking calls are issued for the same host name (which is the case the customer ran into) on thread pool threads.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 54 (54 by maintainers)
OK so summary:
2 birds with one stone then! https://github.com/dotnet/runtime/issues/19443
As I said above, I’m happy to throttle/synchronize/whatever specific kinds of blocking calls when it makes sense because we have knowledge of what the calls are doing and how they are likely to block, etc. The per-hostname serialization here is a good example of it.
But I don’t think this addresses the general problem.
Why different threads? Threads are threads. What I think matters is whether any given thread is blocking at the moment, or not.
https://github.com/uber/denial-by-dns has a great test suite
Triage: Scoped down to the third bullet: