tensorflow: Mixed Precision training is ~10 times slower
Mixed Precision training is too slow when I use mixed_float16 policy
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
- Ubuntu 18.04:
- TensorFlow installed from pip3:
- TensorFlow version 2.2.0:
- Python version 3.7:
- CUDA version 10.1 cuDNN version 7.5:
- GPU model and memory RTX 2080ti 11gb:
I’ve designed a segmentation model, which consists of different regular layers like BatchNorm, Conv2D, Activation, etc… Haven’t designed and used any custom layer.
I’m using binary cross-entropy as a loss function and also put float32 dtype on my network’s outputs (like to official doc says), which is a sigmoid function. Data loading is with float32 as well but when I train with mixed precision, it gets super slow ~6-10 times. I use the same network with the same batch size, without changing anything else. I’ve calculated the time between the processes and here is what it looks like
- Feedforward is slower ~10x
- Loss Computing is slower ~6x
- Gradients computing is slower ~6x
I assume it should be faster than the float32 policy, but it turns out not. It’s super slow. At least I’m expecting it should not be slower and the main advantage would be to take a bigger batch size.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 16 (3 by maintainers)
Note this may have been fixed by https://github.com/tensorflow/tensorflow/commit/67d15573a776119d5a544ed266dc2514ae13c3b5.