onnx2tf: [OSNet] int8 tflite model - catastrophic accuracy degradation
Issue Type
Others
OS
Linux
onnx2tf version number
1.15.8
onnx version number
1.13.1
onnxruntime version number
1.15.0
onnxsim (onnx_simplifier) version number
0.4.33
tensorflow version number
2.13.0
Download URL for ONNX
Parameter Replacement JSON
None
Description
Source Model Information
OSNet is a person-reid model that was trained using Pytorch and converted to ONNX with the pre-trained ImageNet weights.
onnx2tf
conversion command
onnx2tf \
-i ./../onnx_models/040_osnet_x1_0/onnx_fp_32_bs_1.onnx \
-o ./../tflite_models/040_osnet_x1_0/osnet_x1_0_bs_1/ \
-otfv1pb \
-osd \
-oiqt \
-qt per-tensor \
-cind "input.1" "./../calibration_data/onnx2tf_calib/calib_data_duke_500_bs_1_nhwc_fp32.npy" "[[[[0.485,0.456,0.406]]]]" "[[[[0.229,0.224,0.225]]]]"
The quantization process was calibrated using 100 samples from the DukeMTMC person-reid dataset, the samples were normalized between 0 and 1 and preprocessed accordingly.
Issue Description
I checked the accuracy of the converted float32 tflite model and it was pretty much the same as the source model, however, when I checked the accuracy of the int8 model, I encountered a catastrophic accuracy drop (more than 95%)
I read Section 7 from the README file, and it was clearly stated that it could be a matter of the model structure, is there any way to fix this problem?
Resources
You can find the following resources in the attached zip file:
osnet_x1_0_fp_32_bs_1.onnx
: the source ONNX model.osnet_x1_0_imagenet_fp32_bs_1_float32.tflite
: output fp32 tflite model.osnet_x1_0_imagenet_fp32_bs_1_integer_quant.tflite
: output int8 tflite modelaccuracy_check.py
: a python script that takes in the fp32/int8 tflite models and an input image, it runs the models on the input image and measures the cosine similarity of the output embeddings (this will simplify the accuracy check on your end)0001_c6_f0030809.jpg
: an input image sample from the DukeMTMC
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 16 (11 by maintainers)
Commits related to this issue
- Implemented a workaround to deal with the problem that padding with the minimum value causes the output error of `MaxPool2D` to be maximized only when quantizing with INT8 quantization. #444 — committed to PINTO0309/onnx2tf by PINTO0309 a year ago
- Merge pull request #446 from PINTO0309/fix_maxpool_int8 Implemented a workaround to deal with the problem that padding with the minimum value causes the output error of `MaxPool2D` to be maximized on... — committed to PINTO0309/onnx2tf by PINTO0309 a year ago
I’ll gladly discuss other potential issues with you again @PINTO0309
Thank you very much, and since we can get a valid quantized OSNet now, I will close this issue.
I seem to have posted my comment at about the same time as yours.
Your information has given me an understanding of the structure of the model. Thank you.
However, it is true that onnx2tf faithfully transforms the model structure of ONNX, so I do not think that the problem of degradation of the accuracy of the resulting model after quantization is a problem with onnx2tf.
I am not sure where the quantization problem lies. In the past, when I experienced significant accuracy degradation in YOLO’s SiLU, I identified the problem area through diligent research, searched for papers on accuracy degradation in INT8 quantization, and as a result, identified the problem in
SiLU (Swish)
,ReLU6
, andConcat
.I have been working on quantization for about 5 years, but I remember that OSNet has a significant degradation in accuracy. However, I have never done a more in-depth investigation.
Your solution to this problem is going to be a great contribution to the community.
Btw, If the bug in the
-onimc
option is fixed, it will be possible to split the model and see changes in the output, as shown in the figure below.Thanks a lot for the provided fix !!
I started my investigations at the very beginning of the model, and things are getting interesting!
I’m trying to spot the position where the significant accuracy drop begins, that’s why I updated the provided
accuracy_check.py
python script, with a new scriptsubgraph_acc_check.py
that tries to flatten the outputs of the subgraphs of both float32/int8 tflite models and measure the Euclidean Distance of the flattened features (Cosine similarity could be a bit of an issue if you got a feature vector of norm 0, that’s why this code doesn’t even calculate Normalized Euclidean Distance)subgraph_acc_check.zip
/conv1/relu/Relu_output_0
(before the residual blocks right after Relu)onnx2tf -i osnet_x1_0_fp_32_bs_1.onnx -o ./fix_acc_issue/tmp_0 -onimc /conv1/relu/Relu_output_0 -oiqt -qt per-tensor
/maxpool/MaxPool_output_0
(before the residual blocks right after MaxPool2D)onnx2tf -i osnet_x1_0_fp_32_bs_1.onnx -o ./fix_acc_issue/tmp_1 -onimc /maxpool/MaxPool_output_0 -oiqt -qt per-tensor
Interestingly, I have found that the outputs of the int8 tflite model right after the
/maxpool/MaxPool_output_0
are all zeros!!Fixes: https://github.com/PINTO0309/onnx2tf/releases/tag/1.15.9
The regression test by CI takes about 2 hours, so the latest version will be released in about 2 hours.
Thanks very much for this information.
Hope we can contribute and elaborate on solving this issue. And the
-onimc
option fix will simplify our investigations indeed, so thanks a lot in advance!There was a bug in the behavior of the
-onimc
option that is being corrected. It will be improved inv1.15.9
.Thanks for the amazing quick reply @PINTO0309
I can briefly describe the network structure as follows:
Lite 3x3
convolutional block, it’s the same as the Famous MobileNet Depthwise ConvolutionIf you’re interested in more details, you can find more information about the network topology in this paper, and you can have a look at the network implementation in this script
However, I do agree with splitting the model into smaller subgraphs and see where the problem starts, but this would be time-consuming and I won’t manage to do it quickly