nats.rs: all async operations block forever at 500 concurrent subscriptions
Under the hood, asynchronous subscriptions just wrap synchronous calls with blocking::unblock
. Under the hood, unblock
moves work to a pool of up to 500 threads.
This creates an obvious bottleneck: If you create 500 subscriptions, all other asynchronous operations that you attempt will just block forever (or until those subscriptions receive data or are closed).
Perhaps the blocking
crate isn’t the best fit here. And perhaps there should be a bigger warning about everything breaking if you hit this limit.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (14 by maintainers)
Like @ccbrown, I also wonder about the suboptimal way in which async has been implemented for nats. But I’m even more surprised about how this implementation is considered ok and good enough until users come up with complex benchmarks to prove otherwise.
According to the README:
At best, I find the current implementation of async not to be aligned with those goals. At worst, the general description of the goals may be misleading for some users
Rust async is far more than just “benchmarks”. This library claims to be Rust-idiomatic and yet there are a few areas where I disagree.
In what I could gather by reading the async module:
The quote comes from https://tokio.rs/blog/2020-04-preemption and it’s a very good read
I can understand having limited development resources and I’m well aware that sync performs better than async. I certainly never questioned either of those things.
What I am questioning is:
Why is there zero mention of the async crate’s crippling limitations in the README?
If the limitations are known, but not something you can address and not something you’re open to PRs for, the least you could do for the community is warn them:
Maybe also bold it or add some ⚠️’s because that’s the first thing that anyone thinking about using the crate should know.
Is the team open to community discussions or contributions for issues like these?
You have said a lot here about best practices, the crate, and its users. What you have not done is expressed an openness towards improving.
You immediately closed https://github.com/nats-io/nats.rs/pull/209 with a lesson on using threads. In a collaborative environment, a peer would have voiced their concerns in a comment (without closing the PR) or used GitHub’s “request changes” function. It is absolutely possible to solve that problem in a way that addresses your concern, and I am doing so now in my own system. But now it’s every man or woman for himself.
Now, this issue isn’t about https://github.com/nats-io/nats.rs/pull/209 at all. That’s a separate issue. But my impression based on your comments so far is that you’re too protective of this crate to risk letting others muck with it.
And so my question is, what’s the contribution policy? As much as I’d like to give back to the community, I’d also like to avoid wasting your and my time.
I started with a short answer and then my notes got longer, so I guess my answer would be a dreadful “it depends”. Here’s what I have:
About async
Rust’s async is quite special when compared with other languages. One of the reasons may be that Rust’s core/standard library is the opposite of Golang’s, in the sense that it’s kept very lean and small. It includes only the most basic functionality and leaves the implementation of the rest to the community. This philosophy is very much up to debate, but it’d be fair to say that it’s shaped by a unix/systems programming mindset.
The case of async is just another example of this philosophy. There are Future types in the standard library but it’s up to the community to maintain and implement runtime environments for them.
There are three major implementations of async runtimes:
In theory, users can pick and choose which one to use. In practice, it’s a bit more complicated than that. If you’re building an application you would normally assess what libraries you need and which runtime they support. Because Tokio is so widely adopted, tokio is usually what every new project starts with.
For crats like nats, it’s a bit more complicated because by picking a runtime one could alienate users of another runtime. This is where having a look at options may help.
Async for libraries
For what I can gather, the async parts in Nats will mostly deal with network calls and the handling of TCP connections.
In order to make the library work async there are different approaches:
1. Asyncify
One could “asyncify” an existing codebase, that is, a crate which only operates in blocking mode could be wrapped with code provided by the async runtimes so that concurrency can be a possibility. All runtimes provide their own methods for this. For example, tokio has spawn_blocking and spawn_in_place.
The biggest benefit of this approach is that it’s extremely quick and easy to implement.
This is the approach which the NATS crate now has. In this case, there is a second big benefit. By using the
blocking
crate from smol, this library is already runtime agnostic. Unlike tokio “asyncify” methods, theblocking
crate provides a method that works with any runtime. I’m using it with tokio in my project just fine at the moment.However, this approach has one major drawback which led me to write my last comment. It works, but it isn’t great. It’s not the best, it’s not terrible. It just works. I won’t repeat here what I’ve said already, just see my comment above.
Some Rust developers may even consider “asyncified” code bases a bad smell for technical debt. For example, the great diesel crate is extremely popular but the request to support async is long and painful. This issue provides an excruciating read about how complex this can get. The community has created crates like tokio-diesel, and they have the same issues already discussed in this thread: concurrency with caps and limits, risks of deadlocks, good performance on small servers but unable to scale, etc.
This approach may be good for community projects which lack resources to do otherwise. They do work after all. But I personally find it’s a big contradiction for a crate like nats, which describes itself as highly performant and efficient.
2. “Native” async
In contrast to diesel, the authors of sqlx took an entirely different approach: async from the bottom up. This allows them to claim that sqlx is a rust crate which allows concurrency without limits and gotchas. That’s what you get when async is fully supported. This doesn’t come for free though. Sqlx has to support both tokio and async-std if it wants to please everybody
The redis crate is very similar in that they have the sync version, and then users can opt in to async methods with either tokio or async-std.
Redis is also a great learning case, since the official tokio docs are essentially a learn-by-example tutorial that guides the user towards building an async Redis client in Rust.
What then
At this point it would be up to the NATS team to decide what the priorities would be in terms of async support:
The first option is to just carry on with current approach. If this is the case, as a user, I would prefer that the README and project scope is revisited to warn users about gotchas: concurrency caps, partial support of Drop trait where user has to manage subscriptions manually, etc. This would be more honest and sets the right expectations for new users. Otherwise I’d be surprised if a new github issue about async is not created every month by a user who gets caught by this
Rewrite the async module taking something like the Redis crate as inspiration. For what I can tell the Redis crate is extremely similar to Nats: no serialization, no big CPU operations and a lot of connection management and network IO operations. This would be the ideal scenario from the user’s perspective. It’s also the biggest amount of work
If resources are limited, do like (2) but focus only on one runtime like Tokio. I think this could be a good compromise for now. With the right abstractions, support for other runtimes could be planned in the long term or left for community contributions.
Ideally nats would be totally runtime agnostic, but it’s unfortunately still very difficult to write good runtime-agnostic libraries (see https://github.com/rust-lang/wg-async-foundations/issues/45). It’s especially difficult when timers and networking are involved and you want good performance.
I believe currently the best approach for high performance libraries is to create an abstraction over a few of the most popular runtimes and let the user choose one via feature flags.
See for example, the Redis crate, which I believe has pretty similar needs. It defines a
RedisRuntime
trait and provides implementations of that trait for Tokio and Async-std:https://github.com/mitsuhiko/redis-rs/blob/ca5019dbe76cc56c93eaecb5721de8fcf74d1641/src/aio.rs#L55
Tokio is by far the most prolific async runtime, so a Tokio implementation would definitely be the place to start.
@Owez Let us know if you have any issues with new async based client…
0.15.2 addresses https://github.com/nats-io/nats.rs/issues/213, but not this. There is still a hidden limit of 500 outstanding async operations, at which point any further calls to async functions will block forever. I believe this should be re-opened.
FWIW, I would support a rewrite/refactorization to have an async-native version of the nats crate. In my use case I don’t think I’ll hit the limit, but we deal with very sensitive information (I am working in the HIS of the hospital I work for), so we need the maximum reliability and performance.
In order to reduce the maintenance burden, there’s also the possibility of building the sync crate as a facade to the async one (i.e, the opposite you’ve done). There are some crates that follow that strategy.