tensorflow: QAT conversion RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE' issue with tf-nightly

UPDATE

You can now fully quantize QAT models trained in any TF 2.x version. However, this feature is only available from TF version 2.4.0-rc0 onwards (and will be available in the final TF 2.4 release as well).

You will not require any workaround, i.e, you don’t have to use TF 1.x

To verify that your TF version supports this, run the following code and check if runs successfully:

import tensorflow as tf
assert tf.__version__[:3] == "2.4", 'Your TF version ({}), does not support full quantization of QAT models. Upgrade to a TF 2.4 version (2.4.0-rc0, 2.4.0-rc1...2.4) or above'.format(tf.__version__)

ISSUE

System information TensorFlow version (use command below): 2.4.0-dev20200728

Describe the current behavior Error converting quantize aware trained tensorflow model to a fully integer quantized tflite model - error: RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'

Describe the expected behavior Convert successfully

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook. https://colab.research.google.com/gist/sayakpaul/8c8a1d7c94beca26d93b67d92a90d3f0/qat-bad-accuracy.ipynb

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 57 (13 by maintainers)

Most upvoted comments

For QAT models, you don’t need a representative dataset. Also, full integer quantization support for QAT models (full integer with (default float32)/uint8/int8 input/output) is available from TF 2.4 as shown below:

Gist

!pip uninstall -q -y tensorflow tensorflow-gpu
!pip install tensorflow==2.4
!pip install -q tensorflow-model-optimization

import tensorflow as tf
print(tf.__version__)

import numpy as np
import tensorflow as tf
import tensorflow_model_optimization as tfmot

def get_model(is_qat=False):
  (train_x, train_y) , (_, _) = tf.keras.datasets.mnist.load_data()
  train_x = train_x.astype('float32') / 255
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128,activation='relu'),
    tf.keras.layers.Dense(10)
  ])
  if is_qat:
    model = tfmot.quantization.keras.quantize_model(model)
  model.compile(
      optimizer=tf.keras.optimizers.Adam(0.001),
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]
  )
  model.fit(train_x, train_y, batch_size=64, epochs=2, verbose=1)
  return model

## 1. Normal TF Model
model = get_model()

# 1a. Convert normal TF model to INT8 quantized TFLite model (default float32 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
    for i in range(10):
        yield [np.random.uniform(low=0.0, high=1.0, size=(1, 28, 28)).astype(np.float32)]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = representative_dataset_gen
normal_tf_model_quantized_tflite_model = converter.convert()

# 1b. Convert normal TF model to INT8 quantized TFLite model (uint8 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
    for i in range(10):
        yield [np.random.uniform(low=0.0, high=1.0, size=(1, 28, 28)).astype(np.float32)]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = representative_dataset_gen
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
normal_tf_model_quantized_with_uint8_io_tflite_model = converter.convert()

## 2. QAT (Quantize Aware Trained) TF model
qat_model = get_model(is_qat=True)

# 2a. Convert QAT TF model to INT8 quantized TFLite model (default float32 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
qat_tf_model_quantized_tflite_model = converter.convert()

# 2b. Convert QAT TF model to INT8 quantized TFLite model (uint8 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
qat_tf_model_quantized_with_uint8_io_tflite_model = converter.convert()

MeghnaNatraj on Dec 17, 2020

Is there any update on this issue? I am facing the same error while trying to convert a QAT model to INT8.

As far as my understanding, tf2.0 quantization is not supported Yet for full integer inference. Try QAT for tf1.x and everything is smoothly done

@dtlam26 can you point to some resources for QAT for tf1.x and then quanitzation. I am trying the following code (without QAT) but getting some error on TF 1.15:

from tensorflow.keras import layers from tensorflow.keras.models import Sequential num_classes = 20

model = Sequential([ layers.Conv2D(16, 3, padding=‘same’, activation=‘relu’, input_shape=(256, 256, 3)), layers.MaxPooling2D(), layers.Conv2D(32, 3, padding=‘same’, activation=‘relu’), layers.MaxPooling2D(), layers.Conv2D(64, 3, padding=‘same’, activation=‘relu’), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(128, activation=‘relu’), layers.Dense(num_classes) ]) model.compile(optimizer=‘adam’, loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=[‘accuracy’])

model.fit(train_generator, epochs=1, steps_per_epoch=100)

model.save(‘/tmp/temp.h5’)

converter = tf.lite.TFLiteConverter.from_keras_model_file(“/tmp/temp.h5”) converter.inference_type = tf.lite.constants.QUANTIZED_UINT8 input_arrays = converter.get_input_arrays() converter.quantized_input_stats = {input_arrays[0] : (0., 1.)} tflite_model = converter.convert()

I am getting below error: ConverterError: See console for info. 2020-09-23 22:55:27.815650: F tensorflow/lite/toco/tooling_util.cc:1734] Array conv2d_3/Relu, which is an input to the MaxPool operator producing the output array max_pooling2d_3/MaxPool, is lacking min/max data, which is necessary for quantization. If accuracy matters, either target a non-quantized output format, or run quantized training with your model from a floating point checkpoint to change the input graph to contain min/max information. If you don’t care about accuracy, you can pass --default_ranges_min= and --default_ranges_max= for easy experimentation. Fatal Python error: Aborted

You can check out this medium and try to create a training graph and eval graph to QAT 1.x https://medium.com/analytics-vidhya/mobile-inference-b943dc99e29b This GitHub is also a good example https://github.com/lusinlu/tensorflow_lite_guide

dtlam26 on Sep 26, 2020

Is there any update on this issue? I am facing the same error while trying to convert a QAT model to INT8.

hangrymoon01 on Sep 17, 2020

Do you have any plan solving this? I just encounterd this issue… Here is my minimal reproduing code

Train

import tensorflow as tf
from tensorflow import keras
import tensorflow_model_optimization as tfmot

# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0

# Define the model architecture.
model = keras.Sequential([
    keras.layers.InputLayer(input_shape=(28, 28)),
    keras.layers.Flatten(),
    keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam', loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=1, validation_data=(test_images, test_labels))
# 1875/1875 [==============================] - 2s 946us/step - loss: 0.7303 - accuracy: 0.8100 - val_loss: 0.3097 - val_accuracy: 0.9117

# Train the quantization aware model
q_aware_model = tfmot.quantization.keras.quantize_model(model)
q_aware_model.compile(optimizer='adam', loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
q_aware_model.fit(train_images, train_labels, epochs=1, validation_data=(test_images, test_labels))
# 1875/1875 [==============================] - 2s 1ms/step - loss: 0.3107 - accuracy: 0.9136 - val_loss: 0.2824 - val_accuracy: 0.9225

Convert

# Define the representative data.
def representative_data_gen():
    for input_value in tf.data.Dataset.from_tensor_slices(train_images.astype("float32")).batch(1).take(100):
        yield [input_value]

# Successful converting from model
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()

# Successful converting from model to uint8
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model_quant = converter.convert()

# Successful converting from q_aware_model
q_converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
q_converter.optimizations = [tf.lite.Optimize.DEFAULT]
q_converter.representative_dataset = representative_data_gen
q_tflite_model = q_converter.convert()

# Fail converting from q_aware_model to uint8
q_converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
q_converter.inference_input_type = tf.uint8
q_converter.inference_output_type = tf.uint8
q_tflite_model_quant = q_converter.convert()

Throws error

RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'.

Another test without tf.lite.OpsSet.TFLITE_BUILTINS_INT8

# Successful converting from model to uint8
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model_quant = converter.convert()

# Fail converting from q_aware_model to uint8
q_converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
q_converter.optimizations = [tf.lite.Optimize.DEFAULT]
q_converter.representative_dataset = representative_data_gen
q_converter.inference_input_type = tf.uint8
q_converter.inference_output_type = tf.uint8
q_tflite_model_quant = q_converter.convert()

Throws error

RuntimeError: Unsupported output type UINT8 for output tensor 'Identity' of type FLOAT32.

leondgarse on Aug 17, 2020

@sayakpaul

I can reproduce the issue and will get back to you when I resolve this. https://colab.research.google.com/gist/MeghnaNatraj/8458ad508f5355769a980400d4d9d194/qat-bad-accuracy.ipynb

Possible issue: If you remove the TFLITE_BUILTINS_INT8 (don’t enforce INT8) – it works fine. The issue is that the model has 2 consecutive quantize at the beginning and 2 consecutive dequantize at the end (not sure why) – probably because of the way tf.keras..mobilenetv2 is structured.

Couple of things to note (especially as you are involved in creating awesome tutorials! 👍 ): (The colab gist above has all the final code with the following suggested changes. NOTE: it also has some TODOs where i have simplified the code for faster execution)

Ensure you use the latest tensorflow_model_optimization and tensorflow-datasets and uninstall tensorflow when you install tf-nightly
Code readability: A) Try to group many similar code sections into one. Sections can be: all imports and initial settings code, all data processing related code, all training related code, all conversion code, etc. B) If your model is for a basic tutorial and it’s small, use full paths to keras APIs – tf.keras.layers..... instead of from tf.keras.layers import *.
For data generation: Have 3 parts 1) train_raw (loaded from tfds) - data has 3 dimensions 2) train_preprocessed (with all preprocessing steps) - data has 3 dimensions 3) train_data (the final dataset prepared for training, this would have batching shuffle and prefetch function.) - data has 4 dimensions Note: Repeat all 3 for validation data BUT do not shuffle the data for validation (or test dataset.)
Representative dataset - should only have 4 dimensions for images. You initially used the batched training data with shape=(32, 244, 244, 3) and we further add a batch size in the representative dataset (tf.expand_dims(train_data_image, 0)) - as a result the shape increases to 5! (1, 32, 244, 244, 5) This causes some errors which is quite hard to debug (eg: PAD op dimensions exceeded >=4). You instead want (1, 244, 244, 5) hence we use the train_preprocessed data (check the 3rd point above) where the images don’t yet have a batch dimension shape (244, 244, 3) for the representative_dataset function.
Representative dataset - do not use next(iter(train_ds..)). This will make the image and label as a sequential list of items and cause failures. Instead use for image, _ in train_ds_preprocessed:

MeghnaNatraj on Aug 6, 2020

I think both in tf 1.x and tf 2.x the basic approach was to train a non-quantized model until convergence and then fine-tune the trained float model with quantization aware training with training data(or subset). for a few more epochs(may be smaller learning rate). Anyway, i got almost (exactly)same accuracy with Upsample2D(ResizeBilinear) in tf2 with QAT, when compared to plain float model(segmentation).Maybe we should train and compare a fixed model with tf1.x and tf2.x and see if there is difference in accuracy.

You are almost right with the QAT as its protocol is train a non-quantized model until convergence and then fine-tune the trained float model with quantization aware training. This process will let the quantization bit fit better with the model weights and values. But tensorflow apply the representation dataset with only one purpose of fitting the activation quantization better. This if you use with post quantize can be considered as dynamic quantization. I believe that the tf2,x surely is better as the protocol of training is clear defined and of course, it let the representation dataset adjust for the bias, rather using only MovingAverage

dtlam26 on Jan 8, 2021

@anilsathyan7 @sayakpaul yes! Thanks for pointing that out. I’ve updated the example to also include model training 😃

MeghnaNatraj on Dec 17, 2020

@anilsathyan7 yes, you would want to train the model actually so that it can adjust to compensate for the information loss (induced for precision loss).

sayakpaul on Dec 16, 2020

@msokoloff1 @MeghnaNatraj I faced the similar isuue in tf 2.4.0-rc0. I even tried latest source for tf and tfmot ; but the issue persists. QAT with tk.keras produces quantize and dequantize layers and we are unable to convert them to full integer quantization models, even after using post training quantization on top of it?

Is there any other workarounds?

anilsathyan7 on Nov 24, 2020

@dtlam26 Thanks for the resources. @Mattrix00 I also found this notebook that is working for me https://colab.research.google.com/drive/15itdlIyLmXISK6SDAzAFGUgjatfVr0Yq

hangrymoon01 on Sep 28, 2020

@dtlam26

Yes, I know this for post quantize, but my model is QAT, and it can’t inference to int8 on tf2.x. For tf1 it is ok

Please can you tell me how you are able to perform QAT in tf1 ??

I have attached the source for example. However, create eval graph will forget the last layer of your model from the graph. You have to add to the graph a dummy part. Example, tf.maximum(output,1e-27) for regression problems

dtlam26 on Sep 26, 2020

Yes, take() should work as well. Having a note in the documentation on handling large datasets while creating the representative dataset would help. The representative dataset generation can get non-trivial at times and here’s an example (which I am sure you are already aware of).

sayakpaul on Aug 6, 2020