tensorflow: tf.io.gfile.glob missing some patterns. Using tf-nightly

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): - TensorFlow version (use command below): tf-nightly
  • Python version: - Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: - GPU model and memory:

Describe the current behavior cc @Conchylicultor, Please have a look on issue from TFDS tensorflow/datasets#1670, tests are failing for PlantVillage and The300wLp datasets because in _generate_example function of both plant_village.py and the300w_lp.py tf.io.gfile.glob() does not correctly matches all examples patterns. However python glob solves issue see PR tensorflow/datasets#1684 Describe the expected behavior tf.io.gfile.glob() must matches all patterns provided so that all required examples are generated.

Standalone code to reproduce the issue Please have a look on this colab notebook, it contains all tracebacks as well as problem with tf.io.gfile.glob() and how python glob solves this issue.

As glob fix this issue but we have to use tf.io.gfile because we need to support GCS and other distributed files systems.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 21 (9 by maintainers)

Commits related to this issue

Most upvoted comments

@Eshan-Agarwal For the future, here is what a minimum reproductible example looks like:

import os
import glob
import tensorflow.compat.v2 as tf

# Write a dummy file
root_dir = '/tmp/dir_with_(brace)/'
tf.io.gfile.makedirs(root_dir)
with tf.io.gfile.GFile(os.path.join(root_dir, 'some_file.txt'), 'w') as f:
  f.write('')

# Search the file
glob_path = os.path.join(root_dir, "*")
print(list(glob.iglob(glob_path)))        # ['/tmp/dir_with_(brace)/some_file.txt']
print(list(tf.io.gfile.glob(glob_path)))  # []  << Bug: File not found

This allow the team to easily understand what the issue is. They can just copy past the code and experiment with it. This save many hours, as all people working on the issue can get started immediately without having to go through the 10000+ lines of codes of TFDS.

@mihaimaruseac The bug is that tf.io.gfile.glob fails when ( are present in the path. This is a regression as it only appear in TF nightly. Not TF 2.1. This make some TFDS tests fails as some datasets rely on this global pattern to generate the dataset.

I confirm this fixed our tests. Thank you very much!

TF 2.2.0-rc2 has been released and this issue should be fixed now.

@Eshan-Agarwal thank you for confirming.

@mihaimaruseac I believe this should be prioritised. This not only impact TFDS but potentially every users using tf.io.gfile.glob. As the issue is silent, users may not even notice there is a bug. In our case we got lucky to have good unit-tests. Note: The issue only happened externally. Internally, our tests works fine.

@Conchylicultor @mihaimaruseac thanks for your quick responses, Actually I upload temp folder containing some example you can download folder from here. but it is good to use code provided by @Conchylicultor without any external uploading.

@Eshan-Agarwal the difference between the colab and the example template suggested is that we need to have the exact same setup for the colab, whereas the suggested template creates the files (with zero bytes) so it can be easily converted into a test case that now fails and after fixing will succeed.

But it’s ok, I’ll take care of this issue.

@Conchylicultor @mihaimaruseac please look on this colab notebook

Yes, TFDS tests have started failing for patterns like: tf.io.gfile.glob('/path/to/file/[!Code]*[!_Flip]/[!_]*.jpg') or tf.io.gfile.glob('/path/to/*.[jJ][pP][gG]').

@Eshan-Agarwal Could you provide a small self-contained code snippet to reproduce the issue ?

Something like:

import glob
import tensorflow as tf

with tf.io.gfile.GFile('/tmp/file') as f:
  pass

print(list(tf.io.gfile('/tmp/some_pattern')))
print(list(glob.glob('/tmp/some_pattern')))   # Should show different result