vision: Unable to load CelebA dataset. File is not zip file error.
🐛 Bug
Unable to download and load celeba dataset into a loader.
To Reproduce
- Try to load CeleBA dataset with download true returns error
batch_size=25
train_loader = torch.utils.data.DataLoader(
datasets.CelebA('../data', split="train", download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])),
batch_size=batch_size, shuffle=True)
Returns
/usr/local/lib/python3.6/dist-packages/torchvision/datasets/celeba.py in __init__(self, root, split, target_type, transform, target_transform, download)
64
65 if download:
---> 66 self.download()
67
68 if not self._check_integrity():
/usr/local/lib/python3.6/dist-packages/torchvision/datasets/celeba.py in download(self)
118 download_file_from_google_drive(file_id, os.path.join(self.root, self.base_folder), filename, md5)
119
--> 120 with zipfile.ZipFile(os.path.join(self.root, self.base_folder, "img_align_celeba.zip"), "r") as f:
121 f.extractall(os.path.join(self.root, self.base_folder))
122
/usr/lib/python3.6/zipfile.py in __init__(self, file, mode, compression, allowZip64)
1129 try:
1130 if mode == 'r':
-> 1131 self._RealGetContents()
1132 elif mode in ('w', 'x'):
1133 # set the modified flag so central directory gets written
/usr/lib/python3.6/zipfile.py in _RealGetContents(self)
1196 raise BadZipFile("File is not a zip file")
1197 if not endrec:
-> 1198 raise BadZipFile("File is not a zip file")
1199 if self.debug > 1:
1200 print(endrec)
BadZipFile: File is not a zip file
Environment
-
PyTorch version: 1.5.0+cu101
-
Is debug build: No
-
CUDA used to build PyTorch: 10.1
-
OS: Ubuntu 18.04.3 LTS
-
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
-
CMake version: version 3.12.0
Python version: 3.6
- Is CUDA available: Yes
- CUDA runtime version: 10.1.243
- GPU models and configuration: GPU 0: Tesla T4
- Nvidia driver version: 418.67
- cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
Versions of relevant libraries:
- [pip3] numpy==1.18.4
- [pip3] torch==1.5.0+cu101
- [pip3] torchsummary==1.5.1
- [pip3] torchtext==0.3.1
- [pip3] torchvision==0.6.0+cu101
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 23
- Comments: 24 (3 by maintainers)
This is still an issue FYI
Problem still exists. (Jun 14)
This issue is still persisting, is there a way to get the dataset and load it just like we would through
torchvision.datasets
Seems this is a known issue, but wanted to raise this again as per @pmeier 's comment. I didn’t want to open another ticket on this though.
same
This has nothing to do with the loader. We can get the same result with
The underlying problem was reported in #1920: Google Drive has a daily maximum quota for any file, which seems to be exceeded for the CelebA files. You can see this in the response which is mindlessly written to every
.txt
and also.zip
file.@ajayrfhp The only “solution” we can offer is to tell you to wait and try again, since we have no control about your issue. You can ask the author of the dataset to host it on a platform that does not have daily quotas. If you do and he goes through with your proposal please inform us so that we can adapt our code.
@fmassa We should check the contents of the response first before we write them to the files and raise a descriptive error message.
The problem still exists.
Hello everyone! Based on this discussion, this steps can help you (for me they perfectly worked):
celeba
and download to it all files from CelebA google Drive mentioned in this file_listimg_align_celeba.zip
in./celeba
directory (I’m not sure if you should delete zip-file after unpacking)download=False
parameter:This tutorial worked for me!
I would just like to add that the authors also include a Baidu drive you can download the data from on their website. The dataset is also available on Kaggle.
This was fixed in #4109, but the commit is not yet included in a stable release. It will be in the upcoming one.
@fmassa I suggest we wait for another issue raising this problem. At least I won’t check daily if this quota is exceeded. If there is another issue for this and I miss it or you somehow find a day when we can fix this feel free to tag me in. I’ll see what I can do.