vision: Failed to download CelebA dataset using download=True

🐛 Bug

It fails to download the following files

  1. img_align_celeba.zip

Rather than the zip file, it downloads a html file “Google Drive - Quota exceeded”. Returns badZipFile error

  1. list_attr_celeba.txt

Similarly, “Google Drive - Quota exceeded”. This time it returns RuntimeError(‘Dataset not found or corrupted.’ + ’ You can use download=True to download it’)

  1. list_landmarks_align_celeba.txt

Similar to number 2

To Reproduce

Steps to reproduce the behavior:

  1. train_dataset = datasets.CelebA('data', split="train", transform=transforms.ToTensor(), download=True)

Expected behavior

Environment

PyTorch version: 1.2.0 Is debug build: No CUDA used to build PyTorch: 10.0

OS: Microsoft Windows 10 Home Single Language GCC version: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0 CMake version: Could not collect

Python version: 3.7 Is CUDA available: Yes CUDA runtime version: 10.0.130 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect

Versions of relevant libraries: [pip3] numpy==1.17.0 [pip3] torch==1.2.0 [pip3] torchtext==0.4.0 [pip3] torchvision==0.4.0 [conda] Could not collect

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 7
  • Comments: 18 (2 by maintainers)

Most upvoted comments

Can I just point out a workaround that worked for me rather trying my luck every 24 hours.

The needed files for celeba dataset, as defined in the filelist in torchvision’s CelebA class, are as follows:

img_align_celeba.zip, list_attr_celeba.txt, identity_CelebA.txt, list_bbox_celeba.txt, list_landmarks_align_celeba.txt, list_eval_partition.txt

I downloaded them directly from the authors’ google drive link here, and placed them in the path: {root}/celeba

where root is the directory you specify when calling the CelebA class

The error message

Google Drive - Quota exceeded

means, that the traffic of this file (size and number of downloads) exceeds a limit or quota set by Google Drive. Since we are not hosting the dataset we have no chance to help you with this, since this is not an error on our side. According to the answer in the above link this quota is reset every 24 hours, so a possible fix for you might be to try again later and hope that the traffic limit is not reached yet.

I’m trying to do this currently to no avail. Do you know if this is still a functional workaround?

Hey @cooperflourens ,

Try manually downloading from the google drive link, you need to login into Google for this. For more information please see the discussions in #5704 and #6052 .

Hey @abhi-glitchhg ,

Thanks for your reply. I downloaded those files and set download=True and it worked. I think my problem before was that I had download set to false.

Thank you for your help!

I’m trying to do this currently to no avail. Do you know if this is still a functional workaround?

Hey @cooperflourens ,

Try manually downloading from the google drive link, you need to login into Google for this. For more information please see the discussions in #5704 and #6052 .

it has been nearly a year on this issue and the error still pops up @pmeier

@pmeier Can’t the dataset be hosted by other services ?