boto3: Hang in s3.download_file with Celery worker in version 1.4.0

I’ve been using lots of boto3 calls in my Flask app for some time, but the switch to the latest boto3 v1.4.0 has broken my Celery workers. Something that may be unique about my app is that I use S3 to download a secure environment variables file before launching my app or workers. It appears that the new boto3 works with my app, but hangs when launching the Celery worker.

I would temporarily downgrade my boto3 to avoid the problem, but its been a long time since the last release, and I need the elbv2 support that only comes in 1.4.0.

I’ve created a tiny version of my worker (worker2.py) to demonstrate the problem. I’ve verified that using the previous version boto3 1.3.1 results in the worker launching properly. I see all prints and the Celery worker banner output.

If I install boto3 1.4.0, then the second print() statement “Download complete” is never reached. Also note that I tried following the new doc example with boto3.resource and using s3.meta.client, but that fails as well.

#
# Stub Celery worker to demonstrate bug in Boto3 1.4.0. Works fine with previous version Boto3 1.3.1.
# Test with: celery worker -A worker2.celery
#
from flask import Flask
from celery import Celery
import boto3
import tempfile

celery = Celery(__name__, broker='amqp://guest:guest@localhost:5672//')

app = Flask(__name__)

s3 = boto3.client('s3', region_name='us-west-1')
env_file = 'APPNAME.APPSTAGE.env'
with tempfile.NamedTemporaryFile() as s3_file:
    print("Downloading file...")
    response = s3.download_file('APPBUCKET', env_file, s3_file.name)
    print("Download complete!")

You can test it by running the following at the command line:

celery worker -A worker2.celery

Also note that just running the code downloads the file just fine with 1.4.0:

python worker2.py

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 31 (7 by maintainers)

Most upvoted comments

@ask, the change is that download_file() (which is a method to download an object from S3) is now multithreaded. The client creation/initialization that happens when creating a client via boto3.client() does not use threads.

More nuanced is that in previous versions of boto3, we had a conditional in boto3 that was roughly:

def download_file(self, ...): # downloads a file from S3
    file_size = get_file_size_from_s3()
    if file_size < 8MB:
        download_in_this_thread_in_one_api_call()
    else:
        use a concurrent.futures.ThreadPoolExecutor() and download the file chunks in parallel

This meant that in previous versions of boto3, if you stayed under 8MB, downloading a file would never spin up threads. However, above 8MB and it seems like you’d still run into this problem in olde versions of boto3.

In the latest version of boto3, downloading a file from S3 via the download_file() method will always use threads.

We’re still investigating if there’s anything we can do on our end to improve this.

Hope that gives more context into what’s going on.

jamesls on Oct 6, 2016

I am able to reproduce it. I am looking into why it may be happening.

kyleknap on Oct 5, 2016