onnx2tf: [YOLOX][YOLOv8] INT8 wrong output

Issue Type

Others

onnx2tf version number

1.7.23

onnx version number

1.13.1

tensorflow version number

2.12.0rc1

Download URL for ONNX

ONNX YOLOX-nano model was generated using https://github.com/Megvii-BaseDetection/YOLOX/blob/main/tools/export_onnx.py

Parameter Replacement JSON

No parameter replacement

Description

  1. Research. Core project need
  2. The export succeeds but the results are 0
  3. I tried all the possible flags for the ONNX2TF generation
  4. Core project need

I have managed to generate : dynamic_range_quant, full_integer_quant and integer_quant versions of YOLOX using onn2tf. I also built a multi-backend class that supported inference of all of the exported models to achieve a meaningful comparison by using exactly the same evaluation pipeline available in the YOLOX repo for all of them. My results are as follow:

Model size mAPval
0.5:0.95
mAPval
0.5
YOLOX-nano PyTorch (original model) 416 0.256 0.411
YOLOX-nano ONNX 416 0.256 0.411
YOLOX-nano TFLite FP16 416 0.256 0.411
YOLOX-nano TFLite FP32 416 0.256 0.411
YOLOX-nano TFLite full_integer_quant 416 0 0
YOLOX-nano TFLite dynamic_range_quant 416 0 0
YOLOX-nano TFLite integer_quant 416 0 0

The output of the quantized models seem to be wrong as the postprocessing step fails. The confidences are so low that none of the predictions pass through the confidence filtering in the NMS process. Any idea what could be the problem? The float16 and float32 TFLite models works as usual, achieving the result in the table above. Anybody tried onn2tf with YOLOX and got the quantized models working?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 62 (59 by maintainers)

Most upvoted comments

I am closing this somewhat lengthy topic because I have found the cause of the problem.

I don’t know if it may be useful, but 2 months ago I successfully converted ultralytics/yolov5 to .tflite int8, and the export script uses the same TFLiteConverter. It had good performances in inference.

A similar issue has been posted on the official TensorFlow issue. You can find several by searching for “INT8.” It seems that even very simple mobilenet_v2 corrupts the model. I actually tried this morning and sure enough, the model broke on mobilenet_v2.

Ref: https://github.com/tensorflow/tensorflow/issues/52357

  1. May I ask if this issue is solvable at all?

I guess it depends on whether the TensorFlow team can handle it.

  1. Should I select a completely different object detection architecture for INT8 quantization? One that has proven to be exported or even may be engineered for INT8 exportation?

Other than MobileNet-SSD, which I tested and successfully deduced four years ago, there are no other models that I can think of at this time.

  1. https://github.com/PINTO0309/PINTO_model_zoo#sample1---object-detection-by-video-file
  2. https://github.com/PINTO0309/PINTO_model_zoo#sample2---object-detection-by-usb-camera
  3. https://coral.ai/models/

YOLOv8 uses a similar architecture, but I am not sure if INT8 can be used successfully.

https://github.com/PINTO0309/PINTO_model_zoo/tree/main/345_YOLOv8

  1. Given that it seems to be an issue with the quantization specification of TFLiteConverter, should I rather create an issue under the official TF repo?

I think so. It would be better to link to several issues that might be relevant and show the enormity of the problem. This problem appears to be quite critical. Also, I don’t think this problem occurs only in the PyTorch -> ONNX -> TFLite conversion flow. Because in onnx2tf I am just building Keras models in a normal sequential way. If you include the topic of converting models from PyTorch or ONNX, the TensorFlow team may ignore you.

  1. An finally. What is your view on quantization in general? Should one go with a model that has proven to be exported given that the uncertainly in the exportation process is so high (you never know if the model will be quantizable?

As discussed in this issue, it is always a good idea to convert the model once and make sure that the accuracy degradation is not significant or acceptable. Unless we know where the cause of the extremely large INT8 accuracy degradation lies, I don’t see how we can do anything but try at random.

It doesn’t work properly, so we have no choice. I do not want to use unofficial tools unnecessarily. I am simply suggesting the best and most practical method for the current operation at any given time.

I cannot thank you enough for all this

I just used this logic to check steadily.

https://github.com/PINTO0309/onnx2tf/issues/244#issuecomment-1465230129

The --cotof option confirms that there is a discrepancy, but does not show the specific value. Also, the tool does not check the accuracy of the INT8 model.

Yup. That worked. Thanks

When I run

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc /backbone/backbone/dark2/dark2.1/conv2/act/Mul_output_0

on my ONNX model I get an error:

INFO: validation_conditions: np.allclose(onnx_outputs, tf_outputs, rtol=0.0, atol=0.0001, equal_nan=True)
Traceback (most recent call last):
  File "/datadrive/mikel/yolox_tflite_export/env/bin/onnx2tf", line 8, in <module>
    sys.exit(main())
  File "/datadrive/mikel/yolox_tflite_export/env/lib/python3.8/site-packages/onnx2tf/onnx2tf.py", line 1867, in main
    model = convert(
  File "/datadrive/mikel/yolox_tflite_export/env/lib/python3.8/site-packages/onnx2tf/onnx2tf.py", line 1329, in convert
    check_results = onnx_tf_tensor_validation(
  File "/datadrive/mikel/yolox_tflite_export/env/lib/python3.8/site-packages/onnx2tf/utils/common_functions.py", line 3071, in onnx_tf_tensor_validation
    onnx_tensor_shape = onnx_tensor.shape
AttributeError: 'NoneType' object has no attribute 'shape'

The -onimc option allows you to cut the model at any position.

Got it!

TensorFlow is not perfect. Therefore, there must be a point where the output value breaks down significantly at the boundary of some operation.

I am only now steadily narrowing down the problem areas to identify them.

I am not sure what the problem is, but at this point I know the following for certain

  1. The entire INT8 model is not broken.
  2. There is a point somewhere in the middle of the model where the output breaks down significantly.

Completely corrupted output values.

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc  /backbone/Concat_1_output_0

image

(1, 52, 52, 128)
array([[[[-2.16917425e-01, -7.94562474e-02,  3.92092317e-01, ...,
          -2.71391660e-01,  1.83749601e-01, -2.74686277e-01],
         [-2.16917425e-01, -7.94562474e-02,  3.92092317e-01, ...,
          -2.21373931e-01,  1.56326100e-01, -2.70561993e-01],
         [-1.35322347e-01, -7.13786185e-02,  3.29531044e-01, ...,
          -1.89693764e-01,  1.87261999e-01, -2.74573624e-01],
(1, 52, 52, 128)
array([[[[-0.2060553, -0.2060553,  1.4423871, ..., -0.2060553,
           0.4121106, -0.2060553],
         [-0.2060553, -0.2060553,  1.4423871, ..., -0.2060553,
           0.4121106, -0.2060553],
         [-0.2060553, -0.2060553,  1.4423871, ..., -0.2060553,
           0.4121106, -0.2060553],
float32 == int8: False

It still looks fine.

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc  /backbone/backbone/dark5/dark5.1/conv1/act/Mul_output_0

image

(1, 13, 13, 128)
array([[[[ 1.7031100e+00,  2.2036643e+00,  2.0568738e+00, ...,
           2.2377689e+00,  2.0156825e+00,  1.1139498e+00],
         [ 1.7618496e+00,  1.9983290e+00,  1.5791827e+00, ...,
           2.5553849e+00,  1.4779290e+00,  1.3652538e+00],
         [ 1.6511190e+00,  1.9410123e+00,  1.5158001e+00, ...,
           2.2816539e+00,  1.6135874e+00,  1.3911940e+00],
(1, 13, 13, 128)
array([[[[ 1.6459503 ,  2.0685592 ,  1.9128612 , ...,  2.33547   ,
           1.9128612 ,  0.73400486],
         [ 1.5124949 ,  2.0685592 ,  1.3790395 , ...,  2.4689255 ,
           1.3790395 ,  1.245584  ],
         [ 1.3790395 ,  2.0685592 ,  1.5124949 , ...,  2.33547   ,
           1.5124949 ,  1.3790395 ],
float32 == int8: False

I see. Although the error is larger than in 1e-1, it seems that the fatal breakdown of the model is a bit further back.

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc  /backbone/backbone/dark2/dark2.1/conv2/act/Mul_output_0

image

(1, 104, 104, 16)
array([[[[-0.27804813,  0.90013486,  3.0734112 , ...,  3.5922017 ,
           0.8996802 ,  2.5687473 ],
         [-0.07278807,  1.4254215 ,  0.09525204, ..., -0.2754114 ,
           0.8143257 ,  3.1839092 ],
         [-0.07278807,  1.4254215 ,  0.09525204, ..., -0.2754114 ,
           0.8143257 ,  3.1839092 ],
(1, 104, 104, 16)
array([[[[-0.26284206,  0.9418507 ,  3.1979117 , ...,  3.3293328 ,
           0.9418507 ,  2.5189033 ],
         [-0.08761402,  1.5332454 ,  0.4161666 , ..., -0.24093856,
           0.8104297 ,  3.0007803 ],
         [-0.08761402,  1.5332454 ,  0.4161666 , ..., -0.24093856,
           0.8104297 ,  3.0007803 ],
float32 == int8: False

The model seems to break before the first Concat. Things are getting interesting.

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc  /backbone/backbone/dark2/dark2.1/Concat_output_0

image

  • Results
    float32 == int8: False
    

I compared the code-block corresponding to the output decoding into bboxes which comes from here with the corresponding operations in netron I posted here and it looks correct to me.

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc /backbone/backbone/stem/conv/act/Mul_output_0

image

import os
import time
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=Warning)
warnings.simplefilter(action='ignore', category=DeprecationWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=RuntimeWarning)
import random
random.seed(0)
import numpy as np
np.random.seed(0)
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
from tensorflow.lite.python import interpreter as iw
import tensorflow as tf
from pprint import pprint

interpreter = iw.Interpreter(
    model_path="yolox_nano_float32.tflite",
    # model_path="model_float32.tflite",
    num_threads=4,
)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_1_shape = input_details[0]['shape']
input_1_dtype = input_details[0]['dtype']

interpreter.set_tensor(
    input_details[0]['index'],
    tf.convert_to_tensor((np.ones(input_1_shape, np.float32) - 127.5) / 127.5, dtype=input_1_dtype),
)
start_time = time.time()
interpreter.invoke()
stop_time = time.time()
output_data_1 = interpreter.get_tensor(output_details[0]['index'])

GREEN = '\033[32m'
RESET = '\033[0m'
print(f'{GREEN}Float32 Successful inference!{RESET}: elapsed_time (x86 CPU) {(stop_time-start_time)*1000} ms')

###########################################################################################

interpreter = iw.Interpreter(
    model_path="yolox_nano_integer_quant.tflite",
    # model_path="model_integer_quant.tflite",
    num_threads=4,
)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_1_shape = input_details[0]['shape']
input_1_dtype = input_details[0]['dtype']

interpreter.set_tensor(
    input_details[0]['index'],
    tf.convert_to_tensor((np.ones(input_1_shape, np.float32) - 127.5) / 127.5, dtype=input_1_dtype),
)
start_time = time.time()
interpreter.invoke()
stop_time = time.time()
output_data_2 = interpreter.get_tensor(output_details[0]['index'])

GREEN = '\033[32m'
RESET = '\033[0m'
print(f'{GREEN}INT8 Successful inference!{RESET}: elapsed_time (x86 CPU) {(stop_time-start_time)*1000} ms')

print(np.allclose(output_data_1, output_data_2, rtol=0.0, atol=1e-1))

print(output_data_1.shape)
pprint(output_data_1)
print(output_data_2.shape)
pprint(output_data_2)

print(f'float32 == int8: {np.allclose(output_data_1, output_data_2, rtol=0.0, atol=1e-1)}')
  • Results
    float32 == int8: True
    

Apparently, the INT8 model is not entirely broken.

Thanks again for you time @PINTO0309. I actually also tried lowering TF even more which led to the same results. Let me know if can help out with something 😄

Later I will check the weights and quantization parameters of the model. I have no idea what the cause is at this point.

The version I tried is below. All did not go well. v2.12.0rc1 v2.11.0 v2.10.0 v2.9.0

It’s been about 2 years since I’ve tested a quantization model, so I’m going to look back into the past a bit. I may have overlooked something.

Given that the Float16 and 32 models work I cannot understand why the quantized ones do not when generated by the same export call