tensorflow: mixed_precision make train and predict very slow when only using cpu
tensorflow : 2.3
Here is the colab
You can see that it costs 3s to train a epoch while costs 187s to train the same epoch using mixed_precision
Epoch 1/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2968 - accuracy: 0.9134
Epoch 2/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.1448 - accuracy: 0.9575
Epoch 3/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1073 - accuracy: 0.9678
Epoch 4/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0861 - accuracy: 0.9730
Epoch 5/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0734 - accuracy: 0.9769
vs
Epoch 1/5
1875/1875 [==============================] - 187s 100ms/step - loss: 0.2936 - accuracy: 0.9141
Epoch 2/5
1179/1875 [=================>............] - ETA: 1:11 - loss: 0.1455 - accuracy: 0.9555
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 26 (13 by maintainers)
As @byronyi has stated, CPUs do not have hardware support for float16 and so will be slower with mixed_float16. I’ll clarify in the tutorial that mixed precision can actually significantly slow down a model on CPUs, instead of just not speeding it up.
As for the TF serving issue: A float32 and a mixed_float16
tf.train.Checkpointare identical for the same model, as checkpoints do not store the dtype of computations. On the other hand, a float32 and mixed_float16 SavedModel are different, as SavedModels store the graph of computations, which includes the dtype of computations. Using a mixed_float16 SavedModel with TF-Serving on a device that does not support mixed precision will be slow. As a workaround, checkpoints can be used instead, then when a SavedModel is required, the checkpoint can be loaded into a float32 model and a float32 SavedModel can be generated. I’ll talk people working on SavedModel to work on a better solution.But at least it should’t make the performance worse … We often train a model with gpu and deploy it with cpu . In this situation, the bug has a big impact …
@reedwm Hi. Is there any progress on that issue? it will be very helpful when using mixed precision for only “training” time, and serve model on CPU devices. We need something conenient “save as dtype=float32” method.
I’ve clearly stated that mixed precision training has nothing to do with serving your model in full precision. Even in mixed precision training your model is saved in full precision.