prefect: Network failures with self-hosted servers

This is tracking issue for various reports of network failures when self-hosting Prefect Orion.

Notably, these issues seem focused to usage of Prefect 2.6.6.

If adding a report to this issue, please include the following information:

If using Prefect official Docker images for the client or server, provide the image tags
On the server, we are interested in the Prefect version, the database, and server library versions

prefect version
pip freeze | grep -E '(uvicorn|starlette)'

On the client, we are interested in Prefect versions and the client HTTP library versions

prefect version
pip freeze | grep -E '(httpx|httpcore)'

Please include the full traceback for the error
Check for any related error logs on the server

Related to:

https://github.com/PrefectHQ/prefect/issues/7472

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 1
Comments: 28 (14 by maintainers)

Most upvoted comments

We haven’t had this error since we upgraded to 2.7.0 earlier today 👍

voidel on Dec 2, 2022

@peytonrunyan: Haven’t tried the branch, but is retry the right solution here? Sounds a bit like playing the lottery to me, just keep resending the same request, hoping that eventually it would go through.

If the issue is caused by high volume of requests, as @madkinsz suggests - shouldn’t a solution involve some way to throttle/queue requests on the agents? (just thoughts, not that familiar with how Prefect is designed)

eudyptula on Nov 29, 2022

I am getting the same errors. I am running a custom Prefect 2.6.5 version (same for server and clients) with the agent-limit feature from this PR (but I do not think this is related). The logs are the same as what others mention like: ConnectionResetError: [Errno 104] Connection reset by peer and anyio.BrokenResourceError and httpcore.ReadError, httpx.ReadError

It happens when trying to call run_deployment from inside a task in the parent flow

This issue makes Prefect 2 pretty unstable for production environments

andreas-ntonas on Nov 17, 2022

@madkinsz it looks like we got it addressed, so I’m going to go ahead and close this issue. Feel free to reopen it if you think there’s something else that needs handling.

peytonrunyan on Dec 5, 2022

I can also confirm that flows that were failing regularly for me with this issue now seem to be working fine

colindunn on Dec 2, 2022

Howdy yall! Any chance anyone here would be interested in giving this branch a shot to see if it resolves the problems? https://github.com/PrefectHQ/prefect/pull/7593

@MuFaheemkhan , @eudyptula , @carlo-catalyst , @andreas-ntonas ,

peytonrunyan on Nov 22, 2022

We are seeing our BrokenPipeError: [Errno 32] Broken pipe in flows that are starting like 800 tasks in one go with map. The logs also include a RuntimeError: The connection pool was closed while 325 HTTP requests/responses were still in-flight.

Also, the reason behind the http 500 was a database timeout, so had to increase the query timeout from the default 1 second.

So, looking from our perspective, it could very well be issues with high volumen.

eudyptula on Nov 18, 2022

I believe these issues are basically caused by a high volume of requests — we see these issues with the agent which polls frequently and now with run_deploment which also polls frequently.

zanieb on Nov 17, 2022

@MuFaheemkhan could you please edit your post to contain the full Docker image tag if you are using one of our official images or include the versions of the libraries as requested? A full traceback for the error would also be really helpful. Thanks!

zanieb on Nov 15, 2022