tensorflow: TensorFlow 1.4.0 takes more resources and is slower on GPU and CPU

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): https://github.com/tkuanlun350/Tensorflow-SegNet
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 7 x64
  • TensorFlow installed from (source or binary): https://pypi.python.org/pypi/tensorflow-gpu/1.4.0rc1
  • TensorFlow version (use command below): 1.4.0
  • Python version: 3.5
  • CUDA/cuDNN version: Cuda release 8.0, V8.0.60. cuDNN 6.
  • GPU model and memory: NVIDIA P4
  • Exact command to reproduce: c:\python35\python3 main.py --log_dir=./logs --image_dir={image dir} --val_dir= {validation dir} --batch_size=15 --training=True

Describe the problem

Under 1.3.0 I was able to use a batch size of {15, put your max batch size here} for training. Under 1.4.0 I get Resource Exhausted errors for that batch size. So use of GPU resources is going up. Not the right direction.

For me here are the performance effects:

  • TensorFlow GPU 1.3.0: 9.8 images/sec for batch size: 15
  • TensorFlow GPU 1.4.0: Can’t do batch size: 15. 7.8 images/sec for batch size: 12

Source code / logs

tf_bug2.txt

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 2
  • Comments: 37 (15 by maintainers)

Commits related to this issue

Most upvoted comments

On my side, I have a resnet in the same style as the examples in the official tensorflow models repository. Thanks a lot for looking into this.

And it is slower than release 1.3, at least for the NMT model I am using. When I train it in 1.3, each epoch took about 600 seconds, now it takes about 700 seconds.