google-api-python-client: Bug in http.MediaIoBaseDownload

Problem is here. If content-range is not found in response. This will leads to infinite loop because self._total_size will always be None and self._progress == self._total_size will always results in False hence self._done will never be True. Shouldn’t it raise an exception if content-range header is not found?

About this issue

Original URL
State: closed
Created 10 years ago
Comments: 22 (2 by maintainers)

Commits related to this issue

Handle unknown media length (#406) Some media download APIs does not support partial downloads and do not return a 'Content-length' or 'Content-range' header and is therefore never considered done.... — committed to googleapis/google-api-python-client by danielx 6 years ago
Merge pull request #15 from orestica/get_access_token Modify the OAuth2Credentials.get_access_token() method. — committed to akrherz/google-api-python-client by craigcitro 10 years ago

Most upvoted comments

This is still an issue, and a serious one because it makes it impossible to download objects from Google Cloud Storage buckets using MediaIoBaseDownload.

Here’s another way of getting objects (this does it in one go and stores it in memory, so may not be suitable for large files):

request = client.objects().get_media(bucket="blab", object="blab")
response = request.execute()
print response

musalbas on Jul 9, 2015

For anyone looking for a work around:

def downloadFile(file_name, file_id, mimeType, service):
    request = service.files().get_media(fileId=file_id)
    if "application/vnd.google-apps" in mimeType:
        if "document" in mimeType:
            request = service.files().export_media(fileId=file_id, mimeType='application/vnd.openxmlformats-officedocument.wordprocessingml.document')
            file_name = file_name + ".docx"
        else:
            request = service.files().export_media(fileId=file_id, mimeType='application/pdf')
            file_name = file_name + ".pdf"
    print("Downloading -- " + file_name)
    response = request.execute()
    with open(os.path.join(OUTPUT_DIR, file_name), "wb") as wer:
        wer.write(response)

markz0r on Aug 12, 2016

@jonparrott ok thanks will take a look

@Capstan This code example fails as total_size is never defined, thus status is always False. This only occurs for the export_media function which does not provide a ‘size’ field. The get_media function does and has no issues.

file_id = '1ZdR3L3qP4Bkq8noWLJHSr_iBau0DNT4Kli4SxNc2YEo'
request = drive_service.files().export_media(fileId=file_id,
                                             mimeType='application/pdf')
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print "Download %d%%." % int(status.progress() * 100)

source: [https://developers.google.com/drive/v3/web/manage-downloads#downloading_google_documents]

markz0r on Apr 20, 2016

I believe I am running into this issue too.

call pattern here:
   media_body = MediaIoBaseUpload(fd, mime_type[0])
   body = {
        'title': name,
        'mimeType': "application/vnd.google-apps.document",
    }
    service = _get_drive_service()
    df = service.files().insert(
        body=body,
        media_body=media_body).execute()

    request = service.files().export_media(fileId=df['id'], mimeType=DOCX_MIMETYPE)

    fh = io.BytesIO()
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while done is False:
        status, done = downloader.next_chunk()

    fh.seek(0)
    return fh
maybe it has to do with inserting a vnd.google-apps.document file then immediately trying to export it?

Ran 1099 documents sequentially yesterday and 3 of them hit this infinite loop of doom.

Still Happens for BytesIO (and not for FilesIO) - using the MediaIoBaseDownload with example on https://developers.google.com/drive/api/v3/manage-downloads

adding the fh.seek(0) fixes it

golankopi on Jan 10, 2019

Same here. Still happening. I ended up doing service.files().export(fileId=file_id, mimeType=mimeType).execute(http=http) but it’s not ideal since the response is not buffered. A solution would be appreciated!

For those wanting to use a MediaIoBaseDownload, I also did what I describe here: http://stackoverflow.com/a/41643652/72350, which basically works as always for most requests, and just gets the first chunk for those incorrect responses (theoretically it might corrupt the file). Try it at your own risk.

diegojancic on Jan 16, 2017

Using google-api-python-client-1.6.4, still met the same issue, my code is like: …

downloader = MediaIoBaseDownload(fh, request) 
while done is False:
  status, done = downloader.next_chunk()
  print(downloader._total_size)
  print(status.total_size)
  print("Download %d" % int(status.progress() * 100))

… here the “total_size” is always “None” which leads “done” is always false. The target file is a 1580+ rows google spreadsheet. And it is so weird that if the file is less than 1580 rows then this issue will not happen. Hope anyone or API maintainer can help! Thanks!

Michaelhao0227 on Sep 27, 2017

Should be handled by the latest commit.

bantini on Jan 16, 2018