uvicorn-gunicorn-fastapi-docker: Workers go into restarting/crash cycle (WORKER TIMEOUT / signal 6)
I am struggling to know which layer is the root cause here.
My app runs fine, but then suddenly it is unable to serve requests for a while and then “fixes itself”. While it’s unable to serve requests my logs show:
[2022-01-18 08:36:46 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1548)
[2022-01-18 08:36:46 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1575)
[2022-01-18 08:36:46 +0000] [1505] [WARNING] Worker with pid 1548 was terminated due to signal 6
[2022-01-18 08:36:46 +0000] [1505] [WARNING] Worker with pid 1575 was terminated due to signal 6
[2022-01-18 08:36:46 +0000] [1783] [INFO] Booting worker with pid: 1783
[2022-01-18 08:36:46 +0000] [1782] [INFO] Booting worker with pid: 1782
[2022-01-18 08:36:47 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1577)
[2022-01-18 08:36:47 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1578)
[2022-01-18 08:36:47 +0000] [1505] [WARNING] Worker with pid 1578 was terminated due to signal 6
[2022-01-18 08:36:47 +0000] [1784] [INFO] Booting worker with pid: 1784
[2022-01-18 08:36:47 +0000] [1505] [WARNING] Worker with pid 1577 was terminated due to signal 6
[2022-01-18 08:36:47 +0000] [1785] [INFO] Booting worker with pid: 1785
[2022-01-18 08:36:51 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1545)
[2022-01-18 08:36:51 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1551)
[2022-01-18 08:36:51 +0000] [1505] [CRITICAL] WORKER TIMEOUT (pid:1559)
[2022-01-18 08:36:52 +0000] [1505] [WARNING] Worker with pid 1551 was terminated due to signal 6
Initially, I thought it was related to load and resource limits, but it seems to also happen during “typical load” and when resources are nowhere near their limits.
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 29
- Comments: 26
Managed to resolve this issue, sharing in case this helps.
Our issue originated from making external API calls from within an async endpoint. These API calls did not support async, which introduced blocking calls to the event loop, resulting in the uvicorn worker timing out. Our reliance on FastAPI Cache decorators for these async endpoints prevented us from simply redefining these endpoints as sync (
async def
->def
).To resolve, we made use of the
run_in_threadpool()
utility function to ensure these sync calls are run in a separate threadpool, outside the event loop. Alongside this, we updated our gunicorn config so theworkers
andthreads
count was equal - setting these to 4.We released this update over 2 weeks ago and haven’t seen any worker timeouts. Hopefully this helps 🙂
i resolved this issue by adding worker timeout while initiating my gunicorn application.
gunicorn -k uvicorn.workers.UvicornWorker ${APP_MODULE} --bind 0.0.0.0:80 --timeout ${WORKER_TIMEOUT}
@mcazim98 in our case we were not able to redefine our endpoints as sync due to our reliance on FastAPI cache decorators. The FastAPI cache version we were using was
0.1.8
, which did not support sync functions, therefore we needed to use therun_in_threadpool
utility function as a workaround.Thankfully, FastAPI cache now support sync functions as of version
0.2.0
, which means we can now redefine our endpoints as sync and move away from using therun_in_threadpool
function 🙂Facing this issue while using docker. Working perfectly fine if run directly with
gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8080 main:app
.None of the following suggested solutions worked:
gevent
3.7
from3.9
timeout
uvicorn
withoutgunicorn
Can someone please point me in the right direction to resolve this issue?
Facing the same issue. Whenever following code is executed with incorrect smtp_url, port, my worker crashes:
There is no crash if smtp_url or port is valid. Dependencies:
@nicholasmccrea wow, should have been hard to identify it! Great news!
yup i tried with 1 worker, still no luck.
Facing the same issue when I use the haystack. I modifed the docker-compose.yml as following: command: “/bin/bash -c ‘sleep 10 && gunicorn rest_api.application:app -b 0.0.0.0 -k uvicorn.workers.UvicornWorker – workers 1 --timeout 600’” It can work.
Facing the same issue when running long processes on websockets and it ends up terminating the websocket connection. Any fixes?