tensorflow: ResourceExhaustedError when formating ImageNet data to TFRecord

Hi All Tensorflow users,

I followed the steps in https://github.com/tensorflow/models/tree/master/inception to build the imagenet to TFRecord format. But when I execute the command “bazel-bin/inception/download_and_preprocess_imagenet “${DATA_DIR}””, I have the following errors:

Exception in thread Thread-6: Traceback (most recent call last): File “/usr/lib64/python2.7/threading.py”, line 811, in bootstrap_inner File “/usr/lib64/python2.7/threading.py”, line 764, in run File “/home/rengan/.cache/bazel/_bazel_rengan/2c9ddf610cf302ceda3f2b852669382c/execroot/inception/bazel-out/local-fastbuild/bin/inception/build_imagenet_data.runfiles/inception/inception/data/build_imagenet_data.py”, line 388, in _process_image_files_batch image_buffer, height, width = _process_image(filename, coder) File “/home/rengan/.cache/bazel/_bazel_rengan/2c9ddf610cf302ceda3f2b852669382c/execroot/inception/bazel-out/local-fastbuild/bin/inception/build_imagenet_data.runfiles/inception/inception/data/build_imagenet_data.py”, line 316, in _process_image image_data = tf.gfile.FastGFile(filename, ‘r’).read() File “/usr/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py”, line 102, in read File “/usr/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py”, line 72, in _preread_check File “/usr/lib64/python2.7/contextlib.py”, line 24, in __exit File “/usr/lib/python2.7/site-packages/tensorflow/python/framework/errors.py”, line 463, in raise_exception_on_not_ok_status ResourceExhaustedError: /cm/shared/scratch/rengan/Data/imagenet-data/raw-data/train//n03124170/n03124170_675.JPEG

There are similar errors in other threads. It successfully generated 128 validation data but less than 100 training data (should be 1024).

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?: no related issues were found.

Environment info

Operating System: Redhat 7.2

Installed version of CUDA and cuDNN: (please attach the output of ls -l /path/to/cuda/lib/libcud*): -rw-r–r-- 1 root root 560184 May 8 02:00 /usr/local/cuda-8.0/lib64/libcudadevrt.a lrwxrwxrwx 1 root root 16 Sep 12 15:13 /usr/local/cuda-8.0/lib64/libcudart.so -> libcudart.so.8.0 lrwxrwxrwx 1 root root 19 Sep 12 15:13 /usr/local/cuda-8.0/lib64/libcudart.so.8.0 -> libcudart.so.8.0.27 -rwxr-xr-x 1 root root 394472 May 8 02:00 /usr/local/cuda-8.0/lib64/libcudart.so.8.0.27 -rw-r–r-- 1 root root 737516 May 8 02:00 /usr/local/cuda-8.0/lib64/libcudart_static.a -rwxr-xr-x 1 root root 60696704 Sep 24 09:12 /usr/local/cuda-8.0/lib64/libcudnn.so -rwxr-xr-x 1 root root 60696704 Sep 24 09:12 /usr/local/cuda-8.0/lib64/libcudnn.so.5 -rwxr-xr-x 1 root root 60696704 Sep 24 09:12 /usr/local/cuda-8.0/lib64/libcudnn.so.5.1.3 -rw-r–r-- 1 root root 59715990 Sep 24 09:12 /usr/local/cuda-8.0/lib64/libcudnn_static.a

If installed from source, provide

  1. The commit hash (git rev-parse HEAD): f98c5ded31d7da0c2d127c28b2c16f0307a368f0
  2. The output of bazel version: WARNING: Output base ‘/home/rengan/.cache/bazel/_bazel_rengan/2c9ddf610cf302ceda3f2b852669382c’ is on NFS. This may lead to surprising failures and undetermined behavior. . Build label: 0.3.1- (@non-git) Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Thu Sep 29 22:19:27 2016 (1475187567) Build timestamp: 1475187567 Build timestamp as int: 1475187567

Thanks in advance.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

Great! I will push a patch shortly to unblock everyone, and I’ll close this issue. I’ll follow up next week about tf.gfile.FastGFile(), and work on that in a separate issue.