fastapi: Gunicorn on Google Cloud Run get an 504 error status (Upstream Request Timeout)
Why my gunicorn always get a 504 error status code when I open my URL from Cloud Runabout 15 seconds for the first time, and after that the URL can be opened without an error. But after I leave it without opening the URL for about 30-60 minutes, it will return the 504 error again? Is my Gunicorn dead/shutdown? Because when i check it from my Cloud Run log, my gunicorn got a Shutting down, and I think my gunicorn was dead. So I need to keep my Gunicorn always on, but how can I make it to set my gunicorn always on?
My code at startup need to load the Machine Learning Model that got a 1 pickle about 100MB, in my case I need to load 6 pickle file (around 600mb++), and I use FastAPI for my API code.
This is how my pickle load :
# Load all model
@app.on_event("startup")
async def load_model():
# Pathfile
pathfile_model = os.path.join("modules", "model/")
pathfile_data = os.path.join("modules", "data/")
start_time = time.time()
# Load Model
usedcar.price_engine_4w = {}
top5_brand = ["honda", "toyota", "nissan", "suzuki", "daihatsu"]
for i in top5_brand:
with open(pathfile_model + f'{i}_all_in_one.pkl', 'rb') as file:
usedcar.price_engine_4w[i] = pickle.load(file)
with open(pathfile_model + 'ex_Top5_all_in_one.pkl', 'rb') as file:
usedcar.price_engine_4w['non'] = pickle.load(file)
# Load Dataset Match
with open(pathfile_data + settings.DATA_LIST) as path:
usedcar.list_match_seva = pd.read_csv(path)
elapsed_time = time.time() - start_time
print("======================================")
print("INFO : Model loaded Succesfully")
print("MODEL :", usedcar.price_engine_4w)
print("ELAPSED MODEL TIME : ", elapsed_time)
Here are how my main.py code run :
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8080, log_level="info", loop=asyncio)
This is my Dockerfile :
FROM python:3.8-slim-buster
RUN apt-get update --fix-missing
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y libgl1-mesa-dev python3-pip git
RUN mkdir /usr/src/app
WORKDIR /usr/src/app
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip3 install -U setuptools
RUN pip3 install --upgrade pi
RUN pip3 install -r ./requirements.txt --use-feature=2020-resolver
COPY . /usr/src/app
CMD exec gunicorn --bind :8080 --workers 2 --threads 4 main:app --worker-class uvicorn.workers.UvicornH11Worker --preload --timeout 60 --worker-tmp-dir /dev/shm
This is my requirements for uvicorn and gunicorn :
fastapi
fastapi-utils
uvicorn[standard]
gunicorn
This is my Cloud Run Log :
2021-02-15 14:31:54.346 WIT[2021-02-15 07:31:54 +0000] [1] [INFO] Handling signal: term
2021-02-15 14:31:54.385 WIT[2021-02-15 07:31:54 +0000] [11] [INFO] Shutting down
2021-02-15 14:31:54.386 WIT[2021-02-15 07:31:54 +0000] [12] [INFO] Shutting down
2021-02-15 14:31:54.486 WIT[2021-02-15 07:31:54 +0000] [11] [INFO] Waiting for application shutdown.
2021-02-15 14:31:54.486 WIT[2021-02-15 07:31:54 +0000] [11] [INFO] Application shutdown complete.
2021-02-15 14:31:54.486 WIT[2021-02-15 07:31:54 +0000] [12] [INFO] Waiting for application shutdown.
2021-02-15 14:31:54.486 WIT[2021-02-15 07:31:54 +0000] [11] [INFO] Finished server process [11]
2021-02-15 14:31:54.487 WIT[2021-02-15 07:31:54 +0000] [11] [INFO] Worker exiting (pid: 11)
2021-02-15 14:31:54.487 WIT======================================
2021-02-15 14:31:54.487 WITINFO : Model loaded Succesfully
2021-02-15 14:31:54.487 WITELAPSED MODEL TIME : 13.514873743057251
2021-02-15 14:31:54.487 WITINFO : Master Data Updated Succesfully
2021-02-15 14:31:54.487 WITELAPSED DATABASE TIME : 0.5247213840484619
2021-02-15 14:31:54.487 WIT======================================
2021-02-15 14:31:54.487 WIT[2021-02-15 07:31:54 +0000] [12] [INFO] Application shutdown complete.
2021-02-15 14:31:54.487 WIT[2021-02-15 07:31:54 +0000] [12] [INFO] Finished server process [12]
2021-02-15 14:31:54.487 WIT[2021-02-15 07:31:54 +0000] [12] [INFO] Worker exiting (pid: 12)
As we can see, from my cloud run log, my gunicorn was shutdown suddenly.
And this is my Error :

After I looked around, I’ve tried a few things like:
--worker-tmp-dir /dev/shm(I use this line because I think there might be blocking from my Docker Container, so I add this line to make sure there’s no blocking from Docker Container, but it still gets a 504 status). Source 1 Source 2--preload(I use this because I think my cloud run need to save some RAM to start my Gunicorn fastly, in case if my Gunicorn shutdown, then when I load my page again, it will load faster, but it still doesn’t effect). Source- I used my
worker=2,thread=4,graceful_timeout=100, but it still makes my Cloud Run shutdown.
Thank you
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (9 by maintainers)
Glad to read that, good job 👍
@frankie567 Hello frankie, I’m so sorry for my late update. So, this is my update.
Load your model lazily in a dependency I’ve done this, so in here my script load the model when only needed, it was very useful. I can request HTTP more faster than before, and it can avoid error 504 too (surprisingly 😄 ).
Use joblib instead of pickle I’ve done this, and the result unexpected, my model now can load it faster than using
pickle. (It’s crazy 🤣 )About maximum_instances I’ve set my maximum instance to 50 instances, from the story that you give, that was so crazy lol $72000, that soooo damnnnnn tooo muchhh, especially, I can’t imagine that huge of money.
Thank you so much @frankie567
I’m not a specialist, but here is two things worth to try:
Load your model lazily in a dependency
Instead of loading your model at startup, try to load it on the first prediction query. You can wrap this in a FastAPI dependency:
I’ve replaced your actual logic with
time.sleepto simulate loading times. The first prediction will be slow because it’ll have to load the model for the first time. But subsequent predictions will be faster because it’ll be loaded in memory (until the container turns idle). The container startup is now instantaneous, because it has nothing to do.Notice that I’ve defined the dependency as a synchronous method (not async). Since your loading logic performs blocking I/O, it’s better to define it like this because FastAPI will then run it in the external threadpool (doc: https://fastapi.tiangolo.com/async/?h=techn#path-operation-functions). It means that it won’t block your main loop while the model is loading.
Use joblib instead of pickle
joblib is a very good library for persisting objects on the hard-disk. It may prove more efficient that standard pickle.
About maximum_instances
Cloud Run has an autoscaling feature. It means that if you experience very high traffic for various reasons (or that there is a bug in your code that cause the server to loop), Google will create new containers until
maximum_instancesis reached. With a limit of 1000, you can have a bill of thousands of dollars without even noticing!Relevant story about this: https://www.theregister.com/2020/12/10/google_cloud_over_run/
Could you give us your Cloud Run configuration (or the command you make to deploy your service)? By default, Cloud Run instances start with 256 Mb of RAM so, given the size of your model, I suspect you run out of memory.
@frankie567 No problem frankie, thank you so much for answering all of my questions. I really appreciate you. Thank you. Now here I will close my question here.
Sorry, I saw your question and then forgot to answer.
Basically, Cloud Run will scale by creating new instances based on the number of incoming requests. So if you experience a traffic spike, he’ll be able to handle it.
However, creating an instance takes time (we say “cold start”) ; until it can answer requests, which can induce some latency. If you set
minimum_instancesto 4/10/X, it means that there’ll always be 4/10/X containers ready to serve requests even if there is no request to handle. Of course, you are billed for those instances.Unless you have a service with very high traffic, or you expect an enormous traffic spike because a TV show will talk about your company, I don’t think this option will be helpful for you.
Official docs: https://cloud.google.com/run/docs/configuring/min-instances
@rudi101101 So, how did it went? 🙂
Gunicorn uvicorn, with the first prediction.
Be careful with timeout if you choose this approach.
@frankie567 God thank you Frankie, I will try it first. Soon, I will give you the result.
Well, theoretically even after being shut down because it didn’t receive traffic for a certain period of time, it should cold-start and run the startup event again without any issue.
Now, I don’t really understand why you get a timeout error on subsequent start. Random thought: make sure you don’t have any open resources or background tasks pending in your router that could prevent a proper shutdown of the container.
Yes, you could set a minimum instance, but it obviously incurs costs (BTW, you really should set the maximum instances parameter if don’t want to have very bad surprises – the default is at 1000 🤯 –).
Yes, the default is 5 minutes. Should be sufficient for you but, you know, computers 😅
Just to be sure we understand on what’s happening:
Is that so?
Thanks! May I suggest to increase Cloud Run timeout:
It clearly states that a 504 error is triggered when the timeout is reached.