hyper: Http2: Hyper client gets stuck if too many requests are spawned
I have a golang https server to which a rust client sends lot of parallel requests. After few hundreds of requests, it stops and apparently there is no TCP communication.
The REST API accepts a byte buffer in a body and responds back with its length in a json.
To reproduce, the entire code is at: https://github.com/staticgc/hyper-stuck It has golang server, Rust Client & Golang Client The build instructions are very straightforward
The default max number of futures spawned are 400 This apparently is more than the max number of http2 streams at server (which is 250)
If the count is reduced to 200 it works.
Other observations:
- Golang client works with high goroutine count
- Increasing connections in Rust client from 1 to say 3 causes more requests to be processed but again gets stuck
- With 400 futures/requests, when the number of connections = 10 (chosen arbitrarily) it worked ok.
- When HTEST_BUF_SIZE is reduced to say 1KB then even with 1 connection & high future count it works
- If the rust client stops, it does so after predictable number of requests
Number of connections here refers to http2 negotiated connection which hyper (I think) creates only 1 per Client
instance.
So changing HTEST_CONN_COUNT changes the number of Client
instances that are created.
Also to prevent initial flooding at the server, the rust client makes a single http2 request on each Client
instance and then issues the parallel requests.
for i in 0..conn_count {
send_req_https(client_vec[i].clone(), bufsz, url.as_str()).await?;
}
Able to reproduce on: Mac OS & Cent OS Linux
Edit: Apologies for some typos in build instructions in the repo above. I have fixed those. Let me know here if anything remains.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 3
- Comments: 17 (8 by maintainers)
Commits related to this issue
- apply patch from https://github.com/hyperium/hyper/issues/2419#issuecomment-1287375184 — committed to jeromegn/h2 by jeromegn 2 years ago
- fix(server): Fix race condition in client dispatcher (#2419) There exists a race condition in ClientTask::poll() when the request that is sent via h2::client::send_request() is pending open. A task w... — committed to jfourie1/hyper by jfourie1 2 years ago
- fix(http2): Fix race condition in client dispatcher (#3041) There exists a race condition in ClientTask::poll() when the request that is sent via h2::client::send_request() is pending open. A task w... — committed to hyperium/hyper by jfourie1 2 years ago
- Upgrade hyper dependency 0.14.20 has a bug https://github.com/hyperium/hyper/issues/2419 that affects IPA — committed to akoshelev/raw-ipa by akoshelev a year ago
I spent some more time on this today and would definitely not recommend applying the previous “patch”. With that patch applied the request that would have gotten stuck is effectively busy polled until it completes, which is not ideal. The real problem/bug is that this function, specifically the call to
self.inner.poll_pending_open()
does not allow more than one waiter. In addition this single waiter is shared with this and this. So the current model of callingpoll_ready()
for multiple SendRequest clones before callingsend_request()
will cause this problem. I’ll try to add some unit tests to illustrate the problem. It seems that what is needed is some mechanism to allow multiple concurrent tasks to wait for send stream capacity. One thought I had was to use a semaphore and change poll_ready() to poll this semaphore. This semaphore will be released once the request is done and the number of permits allowed on the semaphore will be set to max_send_streams. Does this make sense? Any other ideas?I believe it can be closed.
Some further analysis:
Looking at this loop’s first iteration the following seems to be happening for the request that gets stuck: The stream is not marked as pending yet so this returns
Ok(())
. The stream is marked pending and added to the pending queue via the call tosend_request()
here Thispoll()
here calls this, storing a waker inself.send_task
. For the next iteration of the loop the stream is marked as pending so this will returnPoll::Pending
and immediately overwrite the above waker.Here’s a nice self-contained reproduction for this bug: https://github.com/fasterthanlime/h2-repro
FWIW I observed that an existing waker is silently dropped here. In other words the following assert would fail if added to the top of the function :
assert!(self.send_task.is_none())
Here is a work-around:
Wrap
hyper::client::Client
and guard therequest
method with a semaphore.If more parallel requests are needed, create multiple
CustomClient
Edit: This workaround is only for HTTP/2
I think I might have found the issue.
When the number of active streams exceed 250 (which is the default limit), the future polled here returns
Poll::Pending
. Now the parent future i.e.ClientTask
does not get woken up again when stream count goes below the threshold.So the poll here is not called. Now in my program there there is upper limit to number of futures (controlled by a semaphore) and it waits for existing streams to be done with. All the wake-ups done to the unbounded channel req_tx are already consumed so there is no one to wake-up the future.
@seanmonstar