google-api-python-client: Bug in http.MediaIoBaseDownload

Problem is here. If content-range is not found in response. This will leads to infinite loop because self._total_size will always be None and self._progress == self._total_size will always results in False hence self._done will never be True. Shouldn’t it raise an exception if content-range header is not found?

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Comments: 22 (2 by maintainers)

Commits related to this issue

Most upvoted comments

This is still an issue, and a serious one because it makes it impossible to download objects from Google Cloud Storage buckets using MediaIoBaseDownload.

Here’s another way of getting objects (this does it in one go and stores it in memory, so may not be suitable for large files):

request = client.objects().get_media(bucket="blab", object="blab")
response = request.execute()
print response

For anyone looking for a work around:

def downloadFile(file_name, file_id, mimeType, service):
    request = service.files().get_media(fileId=file_id)
    if "application/vnd.google-apps" in mimeType:
        if "document" in mimeType:
            request = service.files().export_media(fileId=file_id, mimeType='application/vnd.openxmlformats-officedocument.wordprocessingml.document')
            file_name = file_name + ".docx"
        else:
            request = service.files().export_media(fileId=file_id, mimeType='application/pdf')
            file_name = file_name + ".pdf"
    print("Downloading -- " + file_name)
    response = request.execute()
    with open(os.path.join(OUTPUT_DIR, file_name), "wb") as wer:
        wer.write(response)

@jonparrott ok thanks will take a look

@Capstan This code example fails as total_size is never defined, thus status is always False. This only occurs for the export_media function which does not provide a ‘size’ field. The get_media function does and has no issues.

file_id = '1ZdR3L3qP4Bkq8noWLJHSr_iBau0DNT4Kli4SxNc2YEo'
request = drive_service.files().export_media(fileId=file_id,
                                             mimeType='application/pdf')
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print "Download %d%%." % int(status.progress() * 100)

source: [https://developers.google.com/drive/v3/web/manage-downloads#downloading_google_documents]

I believe I am running into this issue too.

call pattern here:

   media_body = MediaIoBaseUpload(fd, mime_type[0])
   body = {
        'title': name,
        'mimeType': "application/vnd.google-apps.document",
    }
    service = _get_drive_service()
    df = service.files().insert(
        body=body,
        media_body=media_body).execute()

    request = service.files().export_media(fileId=df['id'], mimeType=DOCX_MIMETYPE)

    fh = io.BytesIO()
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while done is False:
        status, done = downloader.next_chunk()

    fh.seek(0)
    return fh

maybe it has to do with inserting a vnd.google-apps.document file then immediately trying to export it?

Ran 1099 documents sequentially yesterday and 3 of them hit this infinite loop of doom.

Still Happens for BytesIO (and not for FilesIO) - using the MediaIoBaseDownload with example on https://developers.google.com/drive/api/v3/manage-downloads

adding the fh.seek(0) fixes it

Same here. Still happening. I ended up doing service.files().export(fileId=file_id, mimeType=mimeType).execute(http=http) but it’s not ideal since the response is not buffered. A solution would be appreciated!

For those wanting to use a MediaIoBaseDownload, I also did what I describe here: http://stackoverflow.com/a/41643652/72350, which basically works as always for most requests, and just gets the first chunk for those incorrect responses (theoretically it might corrupt the file). Try it at your own risk.

Using google-api-python-client-1.6.4, still met the same issue, my code is like: …

downloader = MediaIoBaseDownload(fh, request) 
while done is False:
  status, done = downloader.next_chunk()
  print(downloader._total_size)
  print(status.total_size)
  print("Download %d" % int(status.progress() * 100))

… here the “total_size” is always “None” which leads “done” is always false. The target file is a 1580+ rows google spreadsheet. And it is so weird that if the file is less than 1580 rows then this issue will not happen. Hope anyone or API maintainer can help! Thanks!

Should be handled by the latest commit.