onnx2tf: [YOLOX][YOLOv8] INT8 wrong output

Issue Type

Others

onnx2tf version number

1.7.23

onnx version number

1.13.1

tensorflow version number

2.12.0rc1

Download URL for ONNX

ONNX YOLOX-nano model was generated using https://github.com/Megvii-BaseDetection/YOLOX/blob/main/tools/export_onnx.py

Parameter Replacement JSON

No parameter replacement

Description

Research. Core project need
The export succeeds but the results are 0
I tried all the possible flags for the ONNX2TF generation
Core project need

I have managed to generate : dynamic_range_quant, full_integer_quant and integer_quant versions of YOLOX using onn2tf. I also built a multi-backend class that supported inference of all of the exported models to achieve a meaningful comparison by using exactly the same evaluation pipeline available in the YOLOX repo for all of them. My results are as follow:

Model	size	mAP^val 0.5:0.95	mAP^val 0.5
YOLOX-nano PyTorch (original model)	416	0.256	0.411
YOLOX-nano ONNX	416	0.256	0.411
YOLOX-nano TFLite FP16	416	0.256	0.411
YOLOX-nano TFLite FP32	416	0.256	0.411
YOLOX-nano TFLite full_integer_quant	416	0	0
YOLOX-nano TFLite dynamic_range_quant	416	0	0
YOLOX-nano TFLite integer_quant	416	0	0

The output of the quantized models seem to be wrong as the postprocessing step fails. The confidences are so low that none of the predictions pass through the confidence filtering in the NMS process. Any idea what could be the problem? The float16 and float32 TFLite models works as usual, achieving the result in the table above. Anybody tried onn2tf with YOLOX and got the quantized models working?

About this issue

Original URL
State: closed
Created a year ago
Reactions: 1
Comments: 62 (59 by maintainers)

Links to this issue

onnx2tf · PyPI

Most upvoted comments

I am closing this somewhat lengthy topic because I have found the cause of the problem.

PINTO0309 on Mar 19, 2023

I don’t know if it may be useful, but 2 months ago I successfully converted ultralytics/yolov5 to .tflite int8, and the export script uses the same TFLiteConverter. It had good performances in inference.

enzorevo on Mar 18, 2023

A similar issue has been posted on the official TensorFlow issue. You can find several by searching for “INT8.” It seems that even very simple mobilenet_v2 corrupts the model. I actually tried this morning and sure enough, the model broke on mobilenet_v2.

Ref: https://github.com/tensorflow/tensorflow/issues/52357

May I ask if this issue is solvable at all?

I guess it depends on whether the TensorFlow team can handle it.

Should I select a completely different object detection architecture for INT8 quantization? One that has proven to be exported or even may be engineered for INT8 exportation?

Other than MobileNet-SSD, which I tested and successfully deduced four years ago, there are no other models that I can think of at this time.

YOLOv8 uses a similar architecture, but I am not sure if INT8 can be used successfully.

https://github.com/PINTO0309/PINTO_model_zoo/tree/main/345_YOLOv8

Given that it seems to be an issue with the quantization specification of TFLiteConverter, should I rather create an issue under the official TF repo?

I think so. It would be better to link to several issues that might be relevant and show the enormity of the problem. This problem appears to be quite critical. Also, I don’t think this problem occurs only in the PyTorch -> ONNX -> TFLite conversion flow. Because in onnx2tf I am just building Keras models in a normal sequential way. If you include the topic of converting models from PyTorch or ONNX, the TensorFlow team may ignore you.

An finally. What is your view on quantization in general? Should one go with a model that has proven to be exported given that the uncertainly in the exportation process is so high (you never know if the model will be quantizable?

As discussed in this issue, it is always a good idea to convert the model once and make sure that the accuracy degradation is not significant or acceptable. Unless we know where the cause of the extremely large INT8 accuracy degradation lies, I don’t see how we can do anything but try at random.

PINTO0309 on Mar 13, 2023

It doesn’t work properly, so we have no choice. I do not want to use unofficial tools unnecessarily. I am simply suggesting the best and most practical method for the current operation at any given time.

PINTO0309 on Mar 17, 2023

I cannot thank you enough for all this

mikel-brostrom on Mar 13, 2023

I just used this logic to check steadily.

https://github.com/PINTO0309/onnx2tf/issues/244#issuecomment-1465230129

The --cotof option confirms that there is a discrepancy, but does not show the specific value. Also, the tool does not check the accuracy of the INT8 model.

PINTO0309 on Mar 12, 2023

Yup. That worked. Thanks

mikel-brostrom on Mar 12, 2023

When I run

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc /backbone/backbone/dark2/dark2.1/conv2/act/Mul_output_0

on my ONNX model I get an error:

INFO: validation_conditions: np.allclose(onnx_outputs, tf_outputs, rtol=0.0, atol=0.0001, equal_nan=True)
Traceback (most recent call last):
  File "/datadrive/mikel/yolox_tflite_export/env/bin/onnx2tf", line 8, in <module>
    sys.exit(main())
  File "/datadrive/mikel/yolox_tflite_export/env/lib/python3.8/site-packages/onnx2tf/onnx2tf.py", line 1867, in main
    model = convert(
  File "/datadrive/mikel/yolox_tflite_export/env/lib/python3.8/site-packages/onnx2tf/onnx2tf.py", line 1329, in convert
    check_results = onnx_tf_tensor_validation(
  File "/datadrive/mikel/yolox_tflite_export/env/lib/python3.8/site-packages/onnx2tf/utils/common_functions.py", line 3071, in onnx_tf_tensor_validation
    onnx_tensor_shape = onnx_tensor.shape
AttributeError: 'NoneType' object has no attribute 'shape'

mikel-brostrom on Mar 12, 2023

The -onimc option allows you to cut the model at any position.

Got it!

mikel-brostrom on Mar 12, 2023

TensorFlow is not perfect. Therefore, there must be a point where the output value breaks down significantly at the boundary of some operation.

I am only now steadily narrowing down the problem areas to identify them.

I am not sure what the problem is, but at this point I know the following for certain

The entire INT8 model is not broken.
There is a point somewhere in the middle of the model where the output breaks down significantly.

PINTO0309 on Mar 12, 2023

Completely corrupted output values.

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc  /backbone/Concat_1_output_0

(1, 52, 52, 128)
array([[[[-2.16917425e-01, -7.94562474e-02,  3.92092317e-01, ...,
          -2.71391660e-01,  1.83749601e-01, -2.74686277e-01],
         [-2.16917425e-01, -7.94562474e-02,  3.92092317e-01, ...,
          -2.21373931e-01,  1.56326100e-01, -2.70561993e-01],
         [-1.35322347e-01, -7.13786185e-02,  3.29531044e-01, ...,
          -1.89693764e-01,  1.87261999e-01, -2.74573624e-01],
(1, 52, 52, 128)
array([[[[-0.2060553, -0.2060553,  1.4423871, ..., -0.2060553,
           0.4121106, -0.2060553],
         [-0.2060553, -0.2060553,  1.4423871, ..., -0.2060553,
           0.4121106, -0.2060553],
         [-0.2060553, -0.2060553,  1.4423871, ..., -0.2060553,
           0.4121106, -0.2060553],
float32 == int8: False

PINTO0309 on Mar 12, 2023

It still looks fine.

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc  /backbone/backbone/dark5/dark5.1/conv1/act/Mul_output_0

(1, 13, 13, 128)
array([[[[ 1.7031100e+00,  2.2036643e+00,  2.0568738e+00, ...,
           2.2377689e+00,  2.0156825e+00,  1.1139498e+00],
         [ 1.7618496e+00,  1.9983290e+00,  1.5791827e+00, ...,
           2.5553849e+00,  1.4779290e+00,  1.3652538e+00],
         [ 1.6511190e+00,  1.9410123e+00,  1.5158001e+00, ...,
           2.2816539e+00,  1.6135874e+00,  1.3911940e+00],
(1, 13, 13, 128)
array([[[[ 1.6459503 ,  2.0685592 ,  1.9128612 , ...,  2.33547   ,
           1.9128612 ,  0.73400486],
         [ 1.5124949 ,  2.0685592 ,  1.3790395 , ...,  2.4689255 ,
           1.3790395 ,  1.245584  ],
         [ 1.3790395 ,  2.0685592 ,  1.5124949 , ...,  2.33547   ,
           1.5124949 ,  1.3790395 ],
float32 == int8: False

PINTO0309 on Mar 12, 2023

I see. Although the error is larger than in 1e-1, it seems that the fatal breakdown of the model is a bit further back.

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc  /backbone/backbone/dark2/dark2.1/conv2/act/Mul_output_0

(1, 104, 104, 16)
array([[[[-0.27804813,  0.90013486,  3.0734112 , ...,  3.5922017 ,
           0.8996802 ,  2.5687473 ],
         [-0.07278807,  1.4254215 ,  0.09525204, ..., -0.2754114 ,
           0.8143257 ,  3.1839092 ],
         [-0.07278807,  1.4254215 ,  0.09525204, ..., -0.2754114 ,
           0.8143257 ,  3.1839092 ],
(1, 104, 104, 16)
array([[[[-0.26284206,  0.9418507 ,  3.1979117 , ...,  3.3293328 ,
           0.9418507 ,  2.5189033 ],
         [-0.08761402,  1.5332454 ,  0.4161666 , ..., -0.24093856,
           0.8104297 ,  3.0007803 ],
         [-0.08761402,  1.5332454 ,  0.4161666 , ..., -0.24093856,
           0.8104297 ,  3.0007803 ],
float32 == int8: False

PINTO0309 on Mar 12, 2023

The model seems to break before the first Concat. Things are getting interesting.

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc  /backbone/backbone/dark2/dark2.1/Concat_output_0

Results
```
float32 == int8: False
```

PINTO0309 on Mar 12, 2023

I compared the code-block corresponding to the output decoding into bboxes which comes from here with the corresponding operations in netron I posted here and it looks correct to me.

mikel-brostrom on Mar 12, 2023

onnx2tf \
-i yolox_nano.onnx \
-oiqt \
-cotof \
-cotoa 1e-4 \
-onimc /backbone/backbone/stem/conv/act/Mul_output_0

import os
import time
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=Warning)
warnings.simplefilter(action='ignore', category=DeprecationWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=RuntimeWarning)
import random
random.seed(0)
import numpy as np
np.random.seed(0)
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
from tensorflow.lite.python import interpreter as iw
import tensorflow as tf
from pprint import pprint

interpreter = iw.Interpreter(
    model_path="yolox_nano_float32.tflite",
    # model_path="model_float32.tflite",
    num_threads=4,
)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_1_shape = input_details[0]['shape']
input_1_dtype = input_details[0]['dtype']

interpreter.set_tensor(
    input_details[0]['index'],
    tf.convert_to_tensor((np.ones(input_1_shape, np.float32) - 127.5) / 127.5, dtype=input_1_dtype),
)
start_time = time.time()
interpreter.invoke()
stop_time = time.time()
output_data_1 = interpreter.get_tensor(output_details[0]['index'])

GREEN = '\033[32m'
RESET = '\033[0m'
print(f'{GREEN}Float32 Successful inference!{RESET}: elapsed_time (x86 CPU) {(stop_time-start_time)*1000} ms')

###########################################################################################

interpreter = iw.Interpreter(
    model_path="yolox_nano_integer_quant.tflite",
    # model_path="model_integer_quant.tflite",
    num_threads=4,
)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_1_shape = input_details[0]['shape']
input_1_dtype = input_details[0]['dtype']

interpreter.set_tensor(
    input_details[0]['index'],
    tf.convert_to_tensor((np.ones(input_1_shape, np.float32) - 127.5) / 127.5, dtype=input_1_dtype),
)
start_time = time.time()
interpreter.invoke()
stop_time = time.time()
output_data_2 = interpreter.get_tensor(output_details[0]['index'])

GREEN = '\033[32m'
RESET = '\033[0m'
print(f'{GREEN}INT8 Successful inference!{RESET}: elapsed_time (x86 CPU) {(stop_time-start_time)*1000} ms')

print(np.allclose(output_data_1, output_data_2, rtol=0.0, atol=1e-1))

print(output_data_1.shape)
pprint(output_data_1)
print(output_data_2.shape)
pprint(output_data_2)

print(f'float32 == int8: {np.allclose(output_data_1, output_data_2, rtol=0.0, atol=1e-1)}')

Results
```
float32 == int8: True
```

Apparently, the INT8 model is not entirely broken.

PINTO0309 on Mar 12, 2023

Thanks again for you time @PINTO0309. I actually also tried lowering TF even more which led to the same results. Let me know if can help out with something 😄

mikel-brostrom on Mar 12, 2023

Later I will check the weights and quantization parameters of the model. I have no idea what the cause is at this point.

The version I tried is below. All did not go well. v2.12.0rc1 v2.11.0 v2.10.0 v2.9.0

It’s been about 2 years since I’ve tested a quantization model, so I’m going to look back into the past a bit. I may have overlooked something.

PINTO0309 on Mar 12, 2023

Given that the Float16 and 32 models work I cannot understand why the quantized ones do not when generated by the same export call

mikel-brostrom on Mar 12, 2023