tensorflow: TensorFlow 1.4.0 takes more resources and is slower on GPU and CPU
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): https://github.com/tkuanlun350/Tensorflow-SegNet
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 7 x64
- TensorFlow installed from (source or binary): https://pypi.python.org/pypi/tensorflow-gpu/1.4.0rc1
- TensorFlow version (use command below): 1.4.0
- Python version: 3.5
- CUDA/cuDNN version: Cuda release 8.0, V8.0.60. cuDNN 6.
- GPU model and memory: NVIDIA P4
- Exact command to reproduce:
c:\python35\python3 main.py --log_dir=./logs --image_dir={image dir} --val_dir= {validation dir} --batch_size=15 --training=True
Describe the problem
Under 1.3.0 I was able to use a batch size of {15, put your max batch size here} for training. Under 1.4.0 I get Resource Exhausted errors for that batch size. So use of GPU resources is going up. Not the right direction.
For me here are the performance effects:
- TensorFlow GPU 1.3.0: 9.8 images/sec for batch size: 15
- TensorFlow GPU 1.4.0: Can’t do batch size: 15. 7.8 images/sec for batch size: 12
Source code / logs
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 2
- Comments: 37 (15 by maintainers)
Commits related to this issue
- added compiled TensorFlow 1.4.0 for Python 3.6 (built on macOS High Sierra 10.13.1 with CLT 8.2 for CUDA 9 / cuDNN 7) — committed to norman-thomas/tensorflow-gpu-mac by norman-thomas 7 years ago
- Deprecated LeakyReLU to use tf.nn.leaky_relu — committed to tensorpack/tensorpack by ppwwyyxx 7 years ago
On my side, I have a resnet in the same style as the examples in the official tensorflow models repository. Thanks a lot for looking into this.
And it is slower than release 1.3, at least for the NMT model I am using. When I train it in 1.3, each epoch took about 600 seconds, now it takes about 700 seconds.