tensorflow: Mixed Precision training is ~10 times slower

Mixed Precision training is too slow when I use mixed_float16 policy System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
Ubuntu 18.04:
TensorFlow installed from pip3:
TensorFlow version 2.2.0:
Python version 3.7:
CUDA version 10.1 cuDNN version 7.5:
GPU model and memory RTX 2080ti 11gb:

I’ve designed a segmentation model, which consists of different regular layers like BatchNorm, Conv2D, Activation, etc… Haven’t designed and used any custom layer. I’m using binary cross-entropy as a loss function and also put float32 dtype on my network’s outputs (like to official doc says), which is a sigmoid function. Data loading is with float32 as well but when I train with mixed precision, it gets super slow ~6-10 times. I use the same network with the same batch size, without changing anything else. I’ve calculated the time between the processes and here is what it looks like

Feedforward is slower ~10x
Loss Computing is slower ~6x
Gradients computing is slower ~6x

I assume it should be faster than the float32 policy, but it turns out not. It’s super slow. At least I’m expecting it should not be slower and the main advantage would be to take a bigger batch size.

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 2
Comments: 16 (3 by maintainers)

Most upvoted comments

Note this may have been fixed by https://github.com/tensorflow/tensorflow/commit/67d15573a776119d5a544ed266dc2514ae13c3b5.

reedwm on Aug 25, 2020