benchmarks: tf_cnn_benchmarks.py does not support --data_dir with my imagenet1k tfrecords

I’m using the HEAD of both tensorflow and benchmarks. I can run the tf_cnn_benchmarks.py with synthetic data like this:

python3 tf_cnn_benchmarks.py --num_batches=100 --display_every=1 --device=cpu --data_format=NHWC --model=trivial --batch_size=64

But when I try to specify my own local data_dir of tfrecords for imagenet1k, it hangs sometime after printing “Running warm up”:

python3 tf_cnn_benchmarks.py --num_batches=100 --display_every=1 --device=cpu --data_format=NHWC --model=trivial --batch_size=64 --data_dir=/n0/ryan/imagenet1k_tfrecord
TensorFlow:  1.8  
Model:       trivial
Dataset:     imagenet
Mode:        training
SingleSess:  False
Batch size:  64 global
             64.0 per device
Num batches: 100
Num epochs:  0.00 
Devices:     ['/cpu:0']
Data format: NHWC 
Layout optimizer: False
Optimizer:   sgd  
Variables:   parameter_server
==========
Generating model
W0530 13:48:44.750849 140466104280896 tf_logging.py:125] From /home/ryan/sandbox/rreece/onboarding-cerebras/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:1611: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-05-30 13:48:44.798403: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
I0530 13:48:44.929922 140466104280896 tf_logging.py:115] Running local_init_op.
I0530 13:48:50.095620 140466104280896 tf_logging.py:115] Done running local_init_op.
Running warm up

and then it hangs.

Any ideas how I can debug using my own local dataset?

I noticed these seemingly related closed issues: #150 and #176, but they do not seem to be hanging at the same place tf_cnn_benchmarks.py does for me.

Thanks!

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 25 (7 by maintainers)

Commits related to this issue

Give better error message with OutOfRangeError. Before, if an OutOfRangeError was thrown, the Supervisor would silently ignore it, causing the cryptic error: "UnboundLocalError: local variable 'num_s... — committed to tensorflow/benchmarks by reedwm 6 years ago

Most upvoted comments

Sorry it took me so long to get back to this.

I tried the head of benchmarks today with tensorflow 1.9.0, and it worked! Thanks for the feedback. Closing this issue.

rreece on Jul 13, 2018