prefect: Network failures with self-hosted servers
This is tracking issue for various reports of network failures when self-hosting Prefect Orion.
Notably, these issues seem focused to usage of Prefect 2.6.6.
If adding a report to this issue, please include the following information:
- If using Prefect official Docker images for the client or server, provide the image tags
- On the server, we are interested in the Prefect version, the database, and server library versions
prefect version
pip freeze | grep -E '(uvicorn|starlette)'
- On the client, we are interested in Prefect versions and the client HTTP library versions
prefect version
pip freeze | grep -E '(httpx|httpcore)'
- Please include the full traceback for the error
- Check for any related error logs on the server
Related to:
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 28 (14 by maintainers)
We haven’t had this error since we upgraded to 2.7.0 earlier today 👍
@peytonrunyan: Haven’t tried the branch, but is retry the right solution here? Sounds a bit like playing the lottery to me, just keep resending the same request, hoping that eventually it would go through.
If the issue is caused by high volume of requests, as @madkinsz suggests - shouldn’t a solution involve some way to throttle/queue requests on the agents? (just thoughts, not that familiar with how Prefect is designed)
I am getting the same errors. I am running a custom Prefect 2.6.5 version (same for server and clients) with the agent-limit feature from this PR (but I do not think this is related). The logs are the same as what others mention like:
ConnectionResetError: [Errno 104] Connection reset by peer
andanyio.BrokenResourceError
andhttpcore.ReadError
,httpx.ReadError
It happens when trying to call run_deployment from inside a task in the parent flow
This issue makes Prefect 2 pretty unstable for production environments
@madkinsz it looks like we got it addressed, so I’m going to go ahead and close this issue. Feel free to reopen it if you think there’s something else that needs handling.
I can also confirm that flows that were failing regularly for me with this issue now seem to be working fine
Howdy yall! Any chance anyone here would be interested in giving this branch a shot to see if it resolves the problems? https://github.com/PrefectHQ/prefect/pull/7593
@MuFaheemkhan , @eudyptula , @carlo-catalyst , @andreas-ntonas ,
We are seeing our
BrokenPipeError: [Errno 32] Broken pipe
in flows that are starting like 800 tasks in one go with map. The logs also include aRuntimeError: The connection pool was closed while 325 HTTP requests/responses were still in-flight.
Also, the reason behind the http 500 was a database timeout, so had to increase the query timeout from the default 1 second.
So, looking from our perspective, it could very well be issues with high volumen.
I believe these issues are basically caused by a high volume of requests — we see these issues with the agent which polls frequently and now with
run_deploment
which also polls frequently.@MuFaheemkhan could you please edit your post to contain the full Docker image tag if you are using one of our official images or include the versions of the libraries as requested? A full traceback for the error would also be really helpful. Thanks!