tensorflow: TensorFlow 1.4.0 takes more resources and is slower on GPU and CPU

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): https://github.com/tkuanlun350/Tensorflow-SegNet
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 7 x64
TensorFlow installed from (source or binary): https://pypi.python.org/pypi/tensorflow-gpu/1.4.0rc1
TensorFlow version (use command below): 1.4.0
Python version: 3.5
CUDA/cuDNN version: Cuda release 8.0, V8.0.60. cuDNN 6.
GPU model and memory: NVIDIA P4
Exact command to reproduce: c:\python35\python3 main.py --log_dir=./logs --image_dir={image dir} --val_dir= {validation dir} --batch_size=15 --training=True

Describe the problem

Under 1.3.0 I was able to use a batch size of {15, put your max batch size here} for training. Under 1.4.0 I get Resource Exhausted errors for that batch size. So use of GPU resources is going up. Not the right direction.

For me here are the performance effects:

TensorFlow GPU 1.3.0: 9.8 images/sec for batch size: 15
TensorFlow GPU 1.4.0: Can’t do batch size: 15. 7.8 images/sec for batch size: 12

Source code / logs

tf_bug2.txt

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 2
Comments: 37 (15 by maintainers)

Commits related to this issue

added compiled TensorFlow 1.4.0 for Python 3.6 (built on macOS High Sierra 10.13.1 with CLT 8.2 for CUDA 9 / cuDNN 7) — committed to norman-thomas/tensorflow-gpu-mac by norman-thomas 7 years ago
Deprecated LeakyReLU to use tf.nn.leaky_relu — committed to tensorpack/tensorpack by ppwwyyxx 7 years ago

Most upvoted comments

On my side, I have a resnet in the same style as the examples in the official tensorflow models repository. Thanks a lot for looking into this.

jmaye on Nov 6, 2017

And it is slower than release 1.3, at least for the NMT model I am using. When I train it in 1.3, each epoch took about 600 seconds, now it takes about 700 seconds.

bshao001 on Nov 4, 2017