datasets: tensorflow_datasets.load('cats_vs_dogs') not working !

Code NOT WORKING

# load dataset module 
import tensorflow_datasets as tfds
# make downloading progress bar dissable 
tfds.disable_progress_bar()
# download data - cats vs dogs 
_=tfds.load('cats_vs_dogs',            # dataset name 
            as_supervised=False,       # include labels - False
          )

ERROR: -> DownloadError: Failed to get url https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip. HTTP code: 404.

Environment - Google Colab

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 3
  • Comments: 23

Most upvoted comments

Hi. I have a temporary solution below to modify the URL:

setattr(tfds.image_classification.cats_vs_dogs, ‘_URL’,“https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip”)

Hi. I have a temporary solution below to modify the URL:

setattr(tfds.image_classification.cats_vs_dogs, ‘_URL’,“https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip”)

Why I have this error: module ‘tensorflow_datasets’ has no attribute ‘image_classification’

I didn’t attempt this with the split parameter, so I can’t comment on that.

I did not add buffer.seek(0).

Here’s the code I used to get past the issue. The line I changed is prefaced by a comment that says HACKY FIX.

import tensorflow as tf import tensorflow_datasets as tfds import io import zipfile import logging

def __generate_examples(self, archive): num_skipped = 0 for fname, fobj in archive: res = tfds.image_classification.cats_vs_dogs._NAME_RE.match(fname) if not res: # README file, … continue label = res.group(1).lower() if tf.compat.as_bytes(“JFIF”) not in fobj.peek(10): num_skipped += 1 continue

  img_data = fobj.read()
  img_tensor = tf.image.decode_image(img_data)
  img_recoded = tf.io.encode_jpeg(img_tensor)

  # Converting the recoded image back into a zip file container.
  buffer = io.BytesIO()
  with zipfile.ZipFile(buffer, "w") as new_zip:
    new_zip.writestr(fname, img_recoded.numpy())
  buffer.seek(0)
  # HACKY FIX
  new_fobj = zipfile.ZipFile(buffer).open(fname.replace('\\', '/'))

  record = {
      "image": new_fobj,
      "image/filename": fname,
      "label": label,
  }
  yield fname, record

if num_skipped != tfds.image_classification.cats_vs_dogs._NUM_CORRUPT_IMAGES:
  raise ValueError(
      "Expected %d corrupt images, but found %d"
      % (tfds.image_classification.cats_vs_dogs._NUM_CORRUPT_IMAGES, num_skipped)
  )
logging.warning("%d images were corrupted and were skipped", num_skipped)

tfds.image_classification.cats_vs_dogs.CatsVsDogs._generate_examples = __generate_examples data, metadata = tfds.load(‘cats_vs_dogs’, as_supervised=True, with_info=True)

Regarding the last comment, I was getting the same issue. After some poking about, this looks like a problem with the method _generate_examples() on tensor_dataflow.image_classification.cats_vs_dogs.CatsVsDogs. In that method, the following line…

new_fobj = zipfile.ZipFile(buffer).open(fname)

…is causing the exception. The problem is with fname. Once written into the in-memory ZipFile a few lines prior, the path separator may end up being different in the in-memory ZipFile buffer than in fname variable itself, leading to the KeyError exception ‘there is no item named some\path\or\other.ext in the archive’.

I managed to hack my way past it by replacing the _generate_examples method with one I generated on-the-fly that replaced the line above with…

new_fobj = zipfile.ZipFile(buffer).open(fname.replace(‘\’, ‘/’))

…but the fix that needs to be pulled into the repository would need to be a bit hardier than that.

Hi. I have a temporary solution below to modify the URL:

setattr(tfds.image_classification.cats_vs_dogs, ‘_URL’,“https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip”)

Thanks Worked for me…