onnx2tf: [OSNet] int8 tflite model - catastrophic accuracy degradation

Issue Type

Others

OS

Linux

onnx2tf version number

1.15.8

onnx version number

1.13.1

onnxruntime version number

1.15.0

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.13.0

Download URL for ONNX

osnet_quant_issue.zip

Parameter Replacement JSON

None

Description

Source Model Information

OSNet is a person-reid model that was trained using Pytorch and converted to ONNX with the pre-trained ImageNet weights.

onnx2tf conversion command

onnx2tf \
-i ./../onnx_models/040_osnet_x1_0/onnx_fp_32_bs_1.onnx \
-o ./../tflite_models/040_osnet_x1_0/osnet_x1_0_bs_1/ \
-otfv1pb \
-osd \
-oiqt \
-qt per-tensor \
-cind "input.1" "./../calibration_data/onnx2tf_calib/calib_data_duke_500_bs_1_nhwc_fp32.npy" "[[[[0.485,0.456,0.406]]]]" "[[[[0.229,0.224,0.225]]]]"

The quantization process was calibrated using 100 samples from the DukeMTMC person-reid dataset, the samples were normalized between 0 and 1 and preprocessed accordingly.

Issue Description

I checked the accuracy of the converted float32 tflite model and it was pretty much the same as the source model, however, when I checked the accuracy of the int8 model, I encountered a catastrophic accuracy drop (more than 95%)

I read Section 7 from the README file, and it was clearly stated that it could be a matter of the model structure, is there any way to fix this problem?

Resources

You can find the following resources in the attached zip file:

  1. osnet_x1_0_fp_32_bs_1.onnx : the source ONNX model.
  2. osnet_x1_0_imagenet_fp32_bs_1_float32.tflite: output fp32 tflite model.
  3. osnet_x1_0_imagenet_fp32_bs_1_integer_quant.tflite: output int8 tflite model
  4. accuracy_check.py: a python script that takes in the fp32/int8 tflite models and an input image, it runs the models on the input image and measures the cosine similarity of the output embeddings (this will simplify the accuracy check on your end)
  5. 0001_c6_f0030809.jpg: an input image sample from the DukeMTMC

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 16 (11 by maintainers)

Commits related to this issue

Most upvoted comments

I’ll gladly discuss other potential issues with you again @PINTO0309

Thank you very much, and since we can get a valid quantized OSNet now, I will close this issue.

I seem to have posted my comment at about the same time as yours.

Your information has given me an understanding of the structure of the model. Thank you.

However, it is true that onnx2tf faithfully transforms the model structure of ONNX, so I do not think that the problem of degradation of the accuracy of the resulting model after quantization is a problem with onnx2tf.

I am not sure where the quantization problem lies. In the past, when I experienced significant accuracy degradation in YOLO’s SiLU, I identified the problem area through diligent research, searched for papers on accuracy degradation in INT8 quantization, and as a result, identified the problem in SiLU (Swish), ReLU6, and Concat.

I have been working on quantization for about 5 years, but I remember that OSNet has a significant degradation in accuracy. However, I have never done a more in-depth investigation.

Your solution to this problem is going to be a great contribution to the community.

Btw, If the bug in the -onimc option is fixed, it will be possible to split the model and see changes in the output, as shown in the figure below. image

Thanks a lot for the provided fix !!

I started my investigations at the very beginning of the model, and things are getting interesting!

I’m trying to spot the position where the significant accuracy drop begins, that’s why I updated the provided accuracy_check.py python script, with a new script subgraph_acc_check.py that tries to flatten the outputs of the subgraphs of both float32/int8 tflite models and measure the Euclidean Distance of the flattened features (Cosine similarity could be a bit of an issue if you got a feature vector of norm 0, that’s why this code doesn’t even calculate Normalized Euclidean Distance)

subgraph_acc_check.zip

  • First cut was at /conv1/relu/Relu_output_0 (before the residual blocks right after Relu) onnx2tf -i osnet_x1_0_fp_32_bs_1.onnx -o ./fix_acc_issue/tmp_0 -onimc /conv1/relu/Relu_output_0 -oiqt -qt per-tensor

tmp_0

Float32 model outputs flattened shape: (524288,)
Int8 model outputs flattened shape: (524288,)
Euclidean Distance: 6.558413505554199  # Acceptable gap
  • Second cut was at /maxpool/MaxPool_output_0 (before the residual blocks right after MaxPool2D) onnx2tf -i osnet_x1_0_fp_32_bs_1.onnx -o ./fix_acc_issue/tmp_1 -onimc /maxpool/MaxPool_output_0 -oiqt -qt per-tensor

tmp_1

Float32 model outputs flattened shape: (131072,)
Int8 model outputs flattened shape: (131072,)
Euclidean Distance: 152.93930053710938  # Significant gap

Interestingly, I have found that the outputs of the int8 tflite model right after the /maxpool/MaxPool_output_0 are all zeros!!

Fixes: https://github.com/PINTO0309/onnx2tf/releases/tag/1.15.9

  • example
    onnx2tf \
    -i osnet_x1_0_fp_32_bs_1.onnx \
    -onimc /conv2/conv2.0/Relu_output_0 \
    -oiqt \
    -qt per-tensor
    

The regression test by CI takes about 2 hours, so the latest version will be released in about 2 hours.

Thanks very much for this information.

Hope we can contribute and elaborate on solving this issue. And the -onimc option fix will simplify our investigations indeed, so thanks a lot in advance!

There was a bug in the behavior of the -onimc option that is being corrected. It will be improved in v1.15.9.

Thanks for the amazing quick reply @PINTO0309

I can briefly describe the network structure as follows:

  • The underpinning building block of this network is the Lite 3x3 convolutional block, it’s the same as the Famous MobileNet Depthwise Convolution

lite3x3

  • The authors equipped the Famous ResNet residual block with this ‘Lite 3x3’ block, they adopted the multi-streaming topology in the residual block, we have 4 sub-streams in each residual block, each one of them has its own receptive field (based on how many consecutive `Lite 3x3’ blocks we have). Each one of these sub-streams tries to learn features of homogenous scales, and then the learned features are fused using an aggregation unit (Global average pooling -> 1x1 conv -> Relu -> 1x1 conv)

os_residual

  • Regarding the network activations, we don’t have fancy activations at all, 3 types of activations can be found in this network (Relu, Sigmoid, and Linear activation)

If you’re interested in more details, you can find more information about the network topology in this paper, and you can have a look at the network implementation in this script

However, I do agree with splitting the model into smaller subgraphs and see where the problem starts, but this would be time-consuming and I won’t manage to do it quickly