axum-login: AuthContext operations may deadlock

Hello, thanks for this great crate.

I was using axum-sessions to store some stuff, and added axum-login afterwards. I added the auth context argument in the request handlers and tried to use auth.login(&..).await etc. But because I had the session as an argument, there was already an owned RwLock guard (either readable or writable)

pub async fn login_post(
    session: ReadableSession,
    mut auth: AuthContext,
    Form(payload): Form<_>,
) {
    let some_session_stuff = session.get::<_>(key);

    let user = /* authenticate user */;

    /* this will deadlock because `session` is still alive: */
    // auth.login(&user).await.unwrap();

    /* this will not deadlock: */
    drop(session);
    auth.login(&user).await.unwrap();
}

, so in AuthContext::login() in extractors.rs, the write() call will deadlock:

https://github.com/maxcountryman/axum-login/blob/6d2f2cebf52f6804203907be89a6ed6d04c7db66/axum-login/src/extractors.rs#L101-L107

This can be fixed by dropping the requests session handle before calling any auth context method that uses the session. This should probably be documented. Would it be possible to introduce multiple session buckets/slots in axum-sessions and separate the user’s session from axum-login’s?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 21 (17 by maintainers)

Most upvoted comments

As I get time today I’ll be looking in to this via my prior SSCCE project. I’ve already started today so I’ll be dumping things I noticed and questions here as I go through.

Old post, don't bother looking at this, summed up below
A lot of looking at stuff, feel free to completely ignore this First, I took a look at `SessionLayer<Store>` as it is used directly as a layer, first thing I notice is it's size: ```rs #[derive(Clone)] pub struct SessionLayer<Store> { store: Store, cookie_path: String, cookie_name: String, cookie_domain: Option<String>, persistence_policy: PersistencePolicy, session_ttl: Option<Duration>, same_site_policy: SameSite, http_only: bool, secure: bool, key: Key, } ``` Now why do I notice its size? Because it is cloned on every-single-route-and-method-combination in axum, meaning there are a *lot* of copies of it spooling up in a layer on every route and method combination, which is why it is supposed to be very cheap to clone and often why a lot of things just use an Arc either internally or directly for such layers that are added to routers, especially for the large amounts of data in it that is strictly read-only so doesn't need to be cloned hundreds of times even for a small router (most of which will be free'd immediately as it's cloned for every possible method variation, even for ones that don't exist, when it is being built, but this is just load time so it's not as big of an issue, but definitely something to note...). Plus it's cloned as the wrapping layer as well on the overall router fallback fun enough too, mostly this is just doing a lot of needless string allocations, but still, it's also cloning the Store every time, which is fine for `MemoryStore` because it is: ```rs #[derive(Clone, Debug, Default)] pub struct MemoryStore<UserId, User> { inner: Arc<RwLock<HashMap<UserId, User>>>, } ``` Which is very fine.

However, now one issue comes up (not this github issue, but…), every time a new request comes in it clones SessionLayer again, which is running this macro generated code from the Clone derive:

		Self {
			store: self.store.clone(),
			cookie_path: self.cookie_path.clone(),
			cookie_name: self.cookie_name.clone(),
			cookie_domain: self.cookie_domain.clone(),
			persistence_policy: self.persistence_policy.clone(),
			session_ttl: self.session_ttl.clone(),
			same_site_policy: self.same_site_policy.clone(),
			http_only: self.http_only.clone(),
			secure: self.secure.clone(),
			key: self.key.clone(),
		}

And, well, that’s a lot of very needless string copying at the very least, but let’s check how this is used as maybe it’s used mutably and alters it or something… (on a side note I love how axum clones the entire MethodRouter on every call, including every-single-method even instead of just the used ones, admittedly that’s mostly because of tower and tower does need to do it in case there are layers that mutate themselves in a single request, but still…) It does further reinforces that cloning layers and all are supposed to be extremely cheap as it does mean that SessionLayer is being cloned many times on every single request, like in my SSCCE above it’s cloned 56 times on initial program load and 12 times for each and every request, this is why layers need to be extremely cheap to clone.

But, looking at how it is used, where is it being mutated that requires its clone to be so extremely costly, following it’s entire call path through the debugger with all usages of its code, its first usage in axum_sessions is load_or_create being called on the SessionLayer from within the tower Service::call implementation for axum_sessions’ Session<Inner, Store>, which it also clones the SessionLayer within it, which is one of those above 12.

…wait, why is this creating a session even on routes that don’t access session data at all, that seems… very spammy…

And then it clones the entire axum route again internally, on every request (and swaps it with its own inner even though the same thing?)? And a tokio rwlock is acquired to set the ttl even when the ttl does not need to be set?

And it just stuffs it into the extensions map…

---

Looking over this I’m very very certain this shouldn’t be a layer at all, the way it’s used and all it does could and should just be a FromRequestParts extractor (that can store stuff to prevent multiple acquisitions if you really want to in the parts.extensions map that’s passed in as &mut parts), the only layer would be where the SessionLayer that just puts it’s own Arc’d data (as it only &self accesses it all anyway as far as I can see) in the extension map for use by the extractor. What the extractor extracts out on its own drop implementation is what would handle storing the data out, which the layer can use by catching the streaming data and caching it until complete first then add the cookie header with the update.

But I personally think it should be re-architected so there isn’t a new cookie generated (even if often just the same cookie value as before) when storing data, but rather just a random set of bytes (maybe encoding the time into it as well for even more uniqueness and quick ttl check, along with signing it all of course)


Basically need to process the body of the request into a Bytes or so to ‘finish’ it for the current axum-sessions design before taking the Arc if (and only if) it has more than 1 strong owner, yes it will eat memory, yes it breaks streaming’s efficiency, but it’s required with the current design (and it wouldn’t break streaming if it isn’t used in it).

An alternate design would basically have to be either one of:

  • Move the bulk of it into an extractor instead of the layer, make it so the cookie can only be created/updated on initial handling but the backend could of course be cleared out, this will mean that the CookieStore (which honestly shouldn’t exist anyway) won’t work and would need to be removed.
  • Move the bulk of it into an extractor instead of the layer, but with two extractors, one that works with streaming but basically does the above, and another extractor that doesn’t work with streaming but should be used before the handler function returns, if ‘the arc’ or whatever at that point is still ‘open’ (more than 1 strong) then the stream needs to be collected first, else it can be passed through.

Regardless, most of it ‘should’ be moved into an extractor instead of the layer, and there are a lot of inefficiencies that can be fixed with a rather low amount of work (like the layer is huge and gets cloned generally hundreds to potentially thousands of times at program load, and it holds a lot of owned string data, it should be an arc, in addition it gets cloned at least 12 times on every-single-request).

@maxcountryman ah, nice. I guess I should leave this issue open to track this then.