tensorflow: QAT conversion RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE' issue with tf-nightly
UPDATE
You can now fully quantize QAT models trained in any TF 2.x version. However, this feature is only available from TF version 2.4.0-rc0 onwards (and will be available in the final TF 2.4 release as well).
You will not require any workaround, i.e, you don’t have to use TF 1.x
To verify that your TF version supports this, run the following code and check if runs successfully:
import tensorflow as tf
assert tf.__version__[:3] == "2.4", 'Your TF version ({}), does not support full quantization of QAT models. Upgrade to a TF 2.4 version (2.4.0-rc0, 2.4.0-rc1...2.4) or above'.format(tf.__version__)
ISSUE
System information TensorFlow version (use command below): 2.4.0-dev20200728
Describe the current behavior
Error converting quantize aware trained tensorflow model to a fully integer quantized tflite model - error: RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'
Describe the expected behavior Convert successfully
Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook. https://colab.research.google.com/gist/sayakpaul/8c8a1d7c94beca26d93b67d92a90d3f0/qat-bad-accuracy.ipynb
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 57 (13 by maintainers)
For QAT models, you don’t need a representative dataset. Also, full integer quantization support for QAT models (full integer with (default float32)/uint8/int8 input/output) is available from TF 2.4 as shown below:
Gist
You can check out this medium and try to create a training graph and eval graph to QAT 1.x https://medium.com/analytics-vidhya/mobile-inference-b943dc99e29b This GitHub is also a good example https://github.com/lusinlu/tensorflow_lite_guide
Is there any update on this issue? I am facing the same error while trying to convert a QAT model to INT8.
Do you have any plan solving this? I just encounterd this issue… Here is my minimal reproduing code
Throws error
tf.lite.OpsSet.TFLITE_BUILTINS_INT8Throws error
@sayakpaul
I can reproduce the issue and will get back to you when I resolve this. https://colab.research.google.com/gist/MeghnaNatraj/8458ad508f5355769a980400d4d9d194/qat-bad-accuracy.ipynb
Possible issue: If you remove the
TFLITE_BUILTINS_INT8(don’t enforce INT8) – it works fine. The issue is that the model has 2 consecutive quantize at the beginning and 2 consecutive dequantize at the end (not sure why) – probably because of the waytf.keras..mobilenetv2is structured.Couple of things to note (especially as you are involved in creating awesome tutorials! 👍 ): (The colab gist above has all the final code with the following suggested changes. NOTE: it also has some TODOs where i have simplified the code for faster execution)
tensorflow_model_optimizationandtensorflow-datasetsand uninstalltensorflowwhen you installtf-nightlytf.keras.layers.....instead offrom tf.keras.layers import *.tf.expand_dims(train_data_image, 0)) - as a result the shape increases to 5! (1, 32, 244, 244, 5) This causes some errors which is quite hard to debug (eg: PAD op dimensions exceeded >=4). You instead want (1, 244, 244, 5) hence we use thetrain_preprocesseddata (check the 3rd point above) where the images don’t yet have a batch dimension shape (244, 244, 3) for the representative_dataset function.next(iter(train_ds..)). This will make the image and label as a sequential list of items and cause failures. Instead usefor image, _ in train_ds_preprocessed:You are almost right with the QAT as its protocol is train a non-quantized model until convergence and then fine-tune the trained float model with quantization aware training. This process will let the quantization bit fit better with the model weights and values. But tensorflow apply the representation dataset with only one purpose of fitting the activation quantization better. This if you use with post quantize can be considered as dynamic quantization. I believe that the tf2,x surely is better as the protocol of training is clear defined and of course, it let the representation dataset adjust for the bias, rather using only MovingAverage
@anilsathyan7 @sayakpaul yes! Thanks for pointing that out. I’ve updated the example to also include model training 😃
@anilsathyan7 yes, you would want to train the model actually so that it can adjust to compensate for the information loss (induced for precision loss).
@msokoloff1 @MeghnaNatraj I faced the similar isuue in tf 2.4.0-rc0. I even tried latest source for tf and tfmot ; but the issue persists. QAT with tk.keras produces quantize and dequantize layers and we are unable to convert them to full integer quantization models, even after using post training quantization on top of it?
Is there any other workarounds?
@dtlam26 Thanks for the resources. @Mattrix00 I also found this notebook that is working for me https://colab.research.google.com/drive/15itdlIyLmXISK6SDAzAFGUgjatfVr0Yq
I have attached the source for example. However, create eval graph will forget the last layer of your model from the graph. You have to add to the graph a dummy part. Example, tf.maximum(output,1e-27) for regression problems
Yes,
take()should work as well. Having a note in the documentation on handling large datasets while creating the representative dataset would help. The representative dataset generation can get non-trivial at times and here’s an example (which I am sure you are already aware of).