datasets: Error in loading the celeb_a dataset (Py 3.7)

Short description Loading the celeb_a dataset results in an error.

Environment information

  • Operating System: <os> Ubuntu 18.04
  • Python version: <version> python 3.7
  • tensorflow-datasets/tfds-nightly version: 1.0.1
  • tensorflow/tensorflow-gpu/tf-nightly/tf-nightly-gpu version: tf-nightly 1.14.1-dev20190301

Reproduction instructions

>>> import tensorflow as tf
>>> tf.enable_eager_execution()
>>> import tensorflow_datasets as tfds
>>> r = tfds.load("celeb_a")
Downloading / extracting dataset celeb_a (?? GiB) to /home/ayush99/tensorflow_datasets/celeb_a/0.3.0...
Dl Completed...: 100%|█████████████████████████| 4/4 [05:44<00:00, 102.30s/ url]
Traceback (most recent call last): MiB/s]
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 90, in _sync_extract
    for path, handle in iter_archive(from_path, method):
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 160, in iter_zip
    extract_file = z.open(member)
  File "/home/ayush99/anaconda3/lib/python3.7/zipfile.py", line 1480, in open
    self._fpclose, self._lock, lambda: self._writing)
  File "/home/ayush99/anaconda3/lib/python3.7/zipfile.py", line 722, in __init__
    self.seekable = file.seekable
AttributeError: 'GFile' object has no attribute 'seekable'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/registered.py", line 259, in load
    dbuilder.download_and_prepare(**download_and_prepare_kwargs)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/dataset_builder.py", line 220, in download_and_prepare
    max_examples_per_split=download_config.max_examples_per_split)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/dataset_builder.py", line 651, in _download_and_prepare
    for split_generator in self._split_generators(dl_manager):
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/image/celeba.py", line 122, in _split_generators
    "landmarks_celeba": LANDMARKS_DATA,
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 340, in download_and_extract
    return _map_promise(self._download_extract, url_or_urls)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 376, in _map_promise
    res = utils.map_nested(_wait_on_promise, all_promises)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 128, in map_nested
    for k, v in data_struct.items()
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 128, in <dictcomp>
    for k, v in data_struct.items()
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 142, in map_nested
    return function(data_struct)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 360, in _wait_on_promise
    return p.get()
  File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 510, in get
    return self._target_settled_value(_raise=True)
  File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 514, in _target_settled_value
    return self._target()._settled_value(_raise)
  File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 224, in _settled_value
    reraise(type(raise_val), raise_val, self._traceback)
  File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 842, in handle_future_result
    resolve(future.result())
  File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 93, in _sync_extract
    raise ExtractError(resource, err)
tensorflow_datasets.core.download.extractor.ExtractError: Error while extracting file /home/ayush99/tensorflow_datasets/downloads/ucexport_download_id_0B7EVK8r0v71pZjFTYXZWM3FlDDaXUAQO8EGH_a7VqGNLRtW52mva1LzDrb-V723OQN8 (https://drive.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM): 'GFile' object has no attribute 'seekable'.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 23 (8 by maintainers)

Most upvoted comments

same error while loading cats_vs_dogs

@rsepassi The Python version indeed seems to be the issue - there was a breaking change in Gzip in Python 3.7.

This issue is fixed in the current tensorflow:master for Python 3.7 with the following PR: https://github.com/tensorflow/tensorflow/pull/28006

(If upgrading is an issue, an alternative workaround is to use Python 3.6 to call tfds.load() once in order to extract the dataset, which will then download and extract a dataset that’s usable on future tfds.load() calls in Python 3.7.)