curl_cffi: [BUG] Gets stuck forever in various scenarios with `stream=True`
Describe the bug
Running our project’s test suite against curl_cffi@0.5.10b2 using streaming in sync mode, we encountered many tests getting stuck.
From this I’ve put together some test cases covering some various conditions where it gets stuck: https://gist.github.com/coletdjnz/034729c002e39fd97ba8199abf03dff4
One of the common issues appears that in the perform() function, the errors raised by curl appear to not be re-raised in the main thread https://github.com/yifeikong/curl_cffi/blob/c0e6fa0df34ea3d0278597279bc2c0744cd70187/curl_cffi/requests/session.py#L596
So if curl errors on request, it gets stuck on
# Wait for the first chunk
header_recved.wait() # type: ignore
as there would not be any data being sent from curl.
Similarly goes for reading with iter_content, it gets stuck on the queue.get() as no data would be received from curl.
Additionally check out test_multiple_session_req and test_multiple_session_req_read.
For some reason, when making multiple subsequent requests with the same session, curl is throwing Failed to perform, ErrCode: 3, Reason: 'No URL set'. on these (hence them getting stuck).
Note that when you run test_multiple_session_req_read, it sometimes does not get stuck and passes, suggesting there may be a race-condition somewhere?
Expected behavior A clear and concise description of what you expected to happen.
Versions
OS: Linux x64 curl_cffi 0.5.10b2
cffi==1.15.1
curl-cffi==0.5.10b2
iniconfig==2.0.0
packaging==23.1
pluggy==1.2.0
pycparser==2.21
pytest==7.4.0
Additional context Which session are you using? async or sync? Which loop implementation are you using
Sync
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Comments: 28 (14 by maintainers)
If there are no other issues, I will release 0.5.10, and move on to new browser support.
Available as
0.5.10b4, except for macOS M1, for which I have to build the wheels manually due to lack of github action instance.This goes back to the question of upgrading curl version in #53, the current base version is 7.84 if I remember correctly.
pauseseems to be promising.Yes, we should be able to do that, signal the
performthread to quit whencloseis called.You mean this?
The problem lies in the fact that the two concurrent requests are sharing the same curl instance. It’s not a big change to use seperate curl instances for different ongoing requests, which is how it’s implemented in the
AsyncSession.However, another behavior needs to be noticed is that the data starts to be buffered in the memory once you fire off the requests. Current implementation is a workaround of the blocking
perform, by putting which inThreadPoolExecutor.submitto run in a background thread and receiving queued chunks in the main thread.Thus, if you create two streaming requests at once and only consume from one, the other one might be completely buffered in the memory, which is unlike how
requests/httpxbehaved, where you have control over the rate, and probably not what you presume.Anyway, if you find some libcurl API, with which we can get rid of this turn-callback-into-iterative mess, I’m more than happy to change the implementation.