datasets: Error in loading the celeb_a dataset (Py 3.7)
Short description
Loading the celeb_a dataset results in an error.
Environment information
- Operating System: <os> Ubuntu 18.04
- Python version: <version> python 3.7
tensorflow-datasets/tfds-nightlyversion: 1.0.1tensorflow/tensorflow-gpu/tf-nightly/tf-nightly-gpuversion: tf-nightly 1.14.1-dev20190301
Reproduction instructions
>>> import tensorflow as tf
>>> tf.enable_eager_execution()
>>> import tensorflow_datasets as tfds
>>> r = tfds.load("celeb_a")
Downloading / extracting dataset celeb_a (?? GiB) to /home/ayush99/tensorflow_datasets/celeb_a/0.3.0...
Dl Completed...: 100%|█████████████████████████| 4/4 [05:44<00:00, 102.30s/ url]
Traceback (most recent call last): MiB/s]
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 90, in _sync_extract
for path, handle in iter_archive(from_path, method):
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 160, in iter_zip
extract_file = z.open(member)
File "/home/ayush99/anaconda3/lib/python3.7/zipfile.py", line 1480, in open
self._fpclose, self._lock, lambda: self._writing)
File "/home/ayush99/anaconda3/lib/python3.7/zipfile.py", line 722, in __init__
self.seekable = file.seekable
AttributeError: 'GFile' object has no attribute 'seekable'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
return fn(*args, **kwargs)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/registered.py", line 259, in load
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
return fn(*args, **kwargs)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/dataset_builder.py", line 220, in download_and_prepare
max_examples_per_split=download_config.max_examples_per_split)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/dataset_builder.py", line 651, in _download_and_prepare
for split_generator in self._split_generators(dl_manager):
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/image/celeba.py", line 122, in _split_generators
"landmarks_celeba": LANDMARKS_DATA,
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 340, in download_and_extract
return _map_promise(self._download_extract, url_or_urls)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 376, in _map_promise
res = utils.map_nested(_wait_on_promise, all_promises)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 128, in map_nested
for k, v in data_struct.items()
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 128, in <dictcomp>
for k, v in data_struct.items()
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 142, in map_nested
return function(data_struct)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 360, in _wait_on_promise
return p.get()
File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 510, in get
return self._target_settled_value(_raise=True)
File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 514, in _target_settled_value
return self._target()._settled_value(_raise)
File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 224, in _settled_value
reraise(type(raise_val), raise_val, self._traceback)
File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 842, in handle_future_result
resolve(future.result())
File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 93, in _sync_extract
raise ExtractError(resource, err)
tensorflow_datasets.core.download.extractor.ExtractError: Error while extracting file /home/ayush99/tensorflow_datasets/downloads/ucexport_download_id_0B7EVK8r0v71pZjFTYXZWM3FlDDaXUAQO8EGH_a7VqGNLRtW52mva1LzDrb-V723OQN8 (https://drive.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM): 'GFile' object has no attribute 'seekable'.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 23 (8 by maintainers)
same error while loading cats_vs_dogs
@rsepassi The Python version indeed seems to be the issue - there was a breaking change in Gzip in Python 3.7.
This issue is fixed in the current
tensorflow:masterfor Python 3.7 with the following PR: https://github.com/tensorflow/tensorflow/pull/28006(If upgrading is an issue, an alternative workaround is to use Python 3.6 to call
tfds.load()once in order to extract the dataset, which will then download and extract a dataset that’s usable on futuretfds.load()calls in Python 3.7.)