tensorflow: Unsupported Full-Integer TensorFlow Lite models in TF 2
Describe the issue In TF2, the full-integer quantized models produced by the TFLite Converter can only have float input and output type. This is a blocker for users who require int8 or uint8 input and/or output type.
UPDATE: We now support this workflow.
End-to-End Tutorial: https://colab.sandbox.google.com/github/google-coral/tutorials/blob/master/retrain_classification_ptq_tf2.ipynb
Only TFLite Conversion: Convert TF Models to TFLite Full-Integer models You can refer to the code here, also given below:
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
for _ in range(num_calibration_steps):
# Get sample input data as a numpy array in a method of your choosing.
yield [input]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_model = converter.convert()
Only TFLite Inference: Run inference on the TFLite model Note that the one caveat with integer-only models is this – you need to manually map (aka quantize) the float inputs to integer inputs during inference. To understand how this can be done – refer to the equation provided in TensorFlow Lite 8-bit quantization specification document and it’s equivalent code in python below:
import numpy as np
import tensorflow as tf
# Input to the TF model are float values in the range [0, 10] and of size (1, 100)
np.random.seed(0)
tf_input = np.random.uniform(low=0, high=10, size=(1, 100)).astype(np.float32)
# Output of the TF model.
tf_output = keras_model.predict(input)
# Output of the TFLite model.
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()[0]
# Manually quantize the input from float to integer
scale, zero_point = input_details['quantization']
tflite_integer_input = tf_input / scale + zero_point
tflite_integer_input = tflite_integer_input.astype(input_details['dtype'])
interpreter.set_tensor(input_details['index'], tflite_integer_input)
interpreter.invoke()
output_details = interpreter.get_output_details()[0]
tflite_integer_output = interpreter.get_tensor(output_details['index'])
# Manually dequantize the output from integer to float
scale, zero_point = output_details['quantization']
tflite_output = tflite_integer_output.astype(np.float32)
tflite_output = (tflite_output - zero_point) * scale
# Verify that the TFLite model's output is approximately (expect some loss in
# accuracy due to quantization) the same as the TF model's output
assert np.allclose(tflite_output, tf_output, atol=1e-04) == True
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 21
- Comments: 66 (20 by maintainers)
Update: We now support TensorFlow Lite Full-Integer models in TF 2.0, i.e, with integer (
tf.int8
andtf.uint8
types) input and output. Exception: Support for quantize-aware trained models is still in progressEnd-to-End Tutorial: https://colab.sandbox.google.com/github/google-coral/tutorials/blob/master/retrain_classification_ptq_tf2.ipynb
Only TFLite Conversion: Convert TF Models to TFLite Full-Integer models You can refer to the code here, also given below:
Only TFLite Inference: Run inference on the TFLite model Note that the one caveat with integer-only models is this – you need to manually map (aka quantize) the float inputs to integer inputs during inference. To understand how this can be done – refer to the equation provided in TensorFlow Lite 8-bit quantization specification document and it’s equivalent code in python below:
Yes, it is currently a work in progress. We will update this github issue once it’s completed.
On Thu, Apr 23, 2020 at 8:22 PM mahdichtourou24051994 < notifications@github.com> wrote:
waiting for this issue to be resolved, thanks.
Hello, the issue is resolved now with TF 2.4 (nightly). Let us know if you face any issue. (This issue will remain open until we also fix all documentation, and resolve any issues that may arise in the next few days)
Note: The following discussion is not related to the current issue of supporting full integer tensorflow lite models, including input and output, in TF 2.0
@dreamPoet No, this is not possible in TensorFlow 2. We cannot create a uint8 inference tflite model and only support int8 inference model. We’ve moved away from the uint8 quantization because with int8 we’re able to use asymmetrical quantization ranges without paying the same penalty. Refer to https://www.tensorflow.org/lite/performance/quantization_spec for more information
For more details: Youtube: https://www.youtube.com/watch?v=-jBmqY_aFwE Slides used in the Youtube Video: https://docs.google.com/presentation/d/1zGm5bqGrkAepwJZ5PABiYjrIKq1pDnzafa8ZYeaFhXY/edit?usp=sharing
Thank you and that worked. Here’s my Colab Gist for anyone else that might run into a similar problem: https://colab.research.google.com/gist/sayakpaul/d209ac353d3bcea06287be5e91628f18/scratchpad.ipynb.
Remove the line
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
as for QAT you are assuming that the model can be fully quantized.@sayakpaul yes this is perfectly right.
By the way, can we use converter to create an uint8 inference tflite model rather than an int8 inference model?