tensorflow: OutOfRangeError Unknown Error when extracting particular zip file

Please be aware this issue was originally posted in tensorflow/datasets but I got directed here as it seems that the issue is related to the GFile implementation: https://github.com/tensorflow/datasets/issues/1337

Short description When the download of imagenet_resized finished and tfds starts extracting/writing records, the program crashes.

You can reproduce this error by downloading the particular zip file manually and extracting it with tensorflow:

http://www.image-net.org/image/downsample/Imagenet32_train_npz.zip

Environment information

  • Operating System: Windows 10
  • Python version: 3.7
  • tensorflow-datasets version: 1.3.2
  • tensorflow-gpu version: 2.0.0

Reproduction instructions Without TFDS:

import zipfile
import tensorflow.compat.v2 as tf

path = 'path/to/file.zip'
with tf.io.gfile.GFile(path, 'rb') as fobj:
  z = zipfile.ZipFile(fobj)
  for member in z.infolist():
    extract_file = z.open(member)
    print(member.filename)

With TFDS:

import tensorflow_datasets as tfds

imagenet_data, info = tfds.load(name="imagenet_resized/32x32", with_info=True, as_supervised=True)

Link to logs

Dl Size...: 100%|██████████| 3414/3414 [22:47<00:00,  2.60 MiB/s]



0 examples [00:00, ? examples/s]Traceback (most recent call last):
  File "C:\Program Files\Python37\lib\contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_datasets\core\file_format_adapter.py", line 199, in incomplete_dir
    yield tmp_dir
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_datasets\core\dataset_builder.py", line 333, in download_and_prepare
    download_config=download_config)
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1008, in _download_and_prepare
    max_examples_per_split=download_config.max_examples_per_split,
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_datasets\core\dataset_builder.py", line 871, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1033, in _prepare_split
    total=split_info.num_examples, leave=False):
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tqdm\_tqdm.py", line 1005, in __iter__
    for obj in iterable:
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_datasets\image\imagenet_resized.py", line 141, in _generate_examples
    for fname, fobj in archive:
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_datasets\core\download\extractor.py", line 179, in iter_zip
    z = zipfile.ZipFile(fobj)
  File "C:\Program Files\Python37\lib\zipfile.py", line 1225, in __init__
    self._RealGetContents()
  File "C:\Program Files\Python37\lib\zipfile.py", line 1288, in _RealGetContents
    endrec = _EndRecData(fp)
  File "C:\Program Files\Python37\lib\zipfile.py", line 259, in _EndRecData
    fpin.seek(0, 2)
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\lib\io\file_io.py", line 167, in seek
    offset += self.size()
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\lib\io\file_io.py", line 102, in size
    return stat(self.__name).length
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\lib\io\file_io.py", line 727, in stat
    return stat_v2(filename)
  File "C:\Users\[username]\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\lib\io\file_io.py", line 744, in stat_v2
    pywrap_tensorflow.Stat(compat.as_bytes(path), file_statistics)
tensorflow.python.framework.errors_impl.OutOfRangeError: C:\Users\[username]\tensorflow_datasets\downloads\image-net.org_image_downs_Image_train_npzlCJjN-zBsDCdn80BZxJ6qtyTFYcDX7y1OSUjXtuuxPw.zip; Unknown error

Process finished with exit code 1

Expected behavior No error

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20 (5 by maintainers)

Most upvoted comments

Changed the file extractor.py in line 131

    with tf.io.gfile.GFile(path_or_fobj, 'rb') as f_obj:

into

    with open(path_or_fobj, 'rb') as f_obj:

which becomes

@contextlib.contextmanager
def _open_or_pass(path_or_fobj):
  if isinstance(path_or_fobj, six.string_types):
    with open(path_or_fobj, 'rb') as f_obj:
      yield f_obj
  else:
    yield path_or_fobj

and everything works by using

import tensorflow_datasets as tfds

tfds.load('imagenet_resized/32x32')

Chiming in because I am receiving this same OutOfRangeError on Windows 10 with:

import tensorflow_datasets as tfds
coco_data = tfds.load('coco/2017') 

The offending file is tensorflow_datasets\downloads\images.cocodataset.org_zips_train2017aai7WOpfj5nSSHXyFBbeLp3tMXjpA_H3YD4oO54G2Sk.zip.

I can provide the full traceback if you’d like, but it’s the same as the one above and I’d rather not spam the issue.