tensorflow: mixed_precision make train and predict very slow when only using cpu

tensorflow : 2.3 Here is the colab You can see that it costs 3s to train a epoch while costs 187s to train the same epoch using mixed_precision

Epoch 1/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2968 - accuracy: 0.9134
Epoch 2/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.1448 - accuracy: 0.9575
Epoch 3/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1073 - accuracy: 0.9678
Epoch 4/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0861 - accuracy: 0.9730
Epoch 5/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0734 - accuracy: 0.9769

vs

Epoch 1/5
1875/1875 [==============================] - 187s 100ms/step - loss: 0.2936 - accuracy: 0.9141
Epoch 2/5
1179/1875 [=================>............] - ETA: 1:11 - loss: 0.1455 - accuracy: 0.9555

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 26 (13 by maintainers)

Most upvoted comments

As @byronyi has stated, CPUs do not have hardware support for float16 and so will be slower with mixed_float16. I’ll clarify in the tutorial that mixed precision can actually significantly slow down a model on CPUs, instead of just not speeding it up.

As for the TF serving issue: A float32 and a mixed_float16 tf.train.Checkpoint are identical for the same model, as checkpoints do not store the dtype of computations. On the other hand, a float32 and mixed_float16 SavedModel are different, as SavedModels store the graph of computations, which includes the dtype of computations. Using a mixed_float16 SavedModel with TF-Serving on a device that does not support mixed precision will be slow. As a workaround, checkpoints can be used instead, then when a SavedModel is required, the checkpoint can be loaded into a float32 model and a float32 SavedModel can be generated. I’ll talk people working on SavedModel to work on a better solution.

I don’t think it is a bug. This feature is not meant to accelerate your model on regular CPUs.

But at least it should’t make the performance worse … We often train a model with gpu and deploy it with cpu . In this situation, the bug has a big impact …

As @byronyi has stated, CPUs do not have hardware support for float16 and so will be slower with mixed_float16. I’ll clarify in the tutorial that mixed precision can actually significantly slow down a model on CPUs, instead of just not speeding it up.

As for the TF serving issue: A float32 and a mixed_float16 tf.train.Checkpoint are identical for the same model, as checkpoints do not store the dtype of computations. On the other hand, a float32 and mixed_float16 SavedModel are different, as SavedModels store the graph of computations, which includes the dtype of computations. Using a mixed_float16 SavedModel with TF-Serving on a device that does not support mixed precision will be slow. As a workaround, checkpoints can be used instead, then when a SavedModel is required, the checkpoint can be loaded into a float32 model and a float32 SavedModel can be generated. I’ll talk people working on SavedModel to work on a better solution.

@reedwm Hi. Is there any progress on that issue? it will be very helpful when using mixed precision for only “training” time, and serve model on CPU devices. We need something conenient “save as dtype=float32” method.

I’ve clearly stated that mixed precision training has nothing to do with serving your model in full precision. Even in mixed precision training your model is saved in full precision.