onnx2tf: Yolov7-tiny to TensorflowLite conversion results in a dynamic output model incompatible with TfLite Java API
Issue Type
Others
onnx2tf version number
1.5.36
onnx version number
1.12.0
tensorflow version number
2.10.1
Download URL for ONNX
pip install onnx==1.12.0
Parameter Replacement JSON
none
Description
Hi, your library is awesome!
I converted the Yolov7-tiny from PyTorch to TfLite using:
onnx2tf -i yolov7-tiny.onnx -o models-NHWC-final/ -osd -oh5 -cotof
I am trying to use it on an android device. The model works when tested on a PC however the Tensorflow Java API for Android does not support dynamic output models according to their documentation: https://www.tensorflow.org/lite/guide/inference. While the resulting yolo tflite model has a dynamic number of outputs( the number of outputs change with the number of indications/detentions)
On the other hand, if I follow the conversion path PyTorch -> ONNX-> Tensorflow I do get yolov7 with a fixed output size so I suspect it is possible to achieve this with onnx2tf as well while also doing the NcHW to NHWc conversion in the process.
Is there a way to have onnx2tf output a fixed/static output .tflite model for yolov7-tiny?
Thank you
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 34 (18 by maintainers)
You’re amazing, you fixed it! Maybe in the future the GPU Delegate will be better supported as well.
Thank you very much!
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/delegates/gpu/gl/kernels
Tensorflow GPU delegates supports only limited operations. I guess it is almost impossible to implement NMS using those. It looks even
tf.range
is not supported, which makes almost impossible to implement neccesarytop-k
orsort
action for NMS.There are two options for now.
NonMaxSuppression
to other operations, it is possible by usingtf.image.non_max_suppression_padded
. The implementation using several sub-operations unlikenon_max_suppression_v4
as shown above.non_max_suppression_v4
haspad_to_max_output_size
option. For now,NonMaxSuppression.py
passingFalse
and using slice to remove excess indices. After making option for static output for NMS, it is possible to make user to determine maximum box number.If the GPU support is not there let’s have it running on the CPU and see if it is comparable with yolov4 in terms of computational performance. And yolov4 did pretty well.
The paper mentions that there is a significant computational reduction happening in yolov7 compared to v4
Your implementation details are awesome.
About the API I am using the Java version of tensorflow-lite-gpu. Here is my TensorFlow dependency list:
implementation "org.tensorflow:tensorflow-lite:${tflite_version}" implementation "org.tensorflow:tensorflow-lite-gpu:${tflite_version}" implementation "org.tensorflow:tensorflow-lite-gpu-api:${tflite_version}"
Here is a good guide to tfLite for Android: https://www.tensorflow.org/lite/guide/inferenceIn the yolov7 case our output tensor has the shape [num_boxes, 7]. The result for each box is an array of size 7 that contains:
[batch_number, boxLeftLimitX , boxTopLimitY , boxRightLimitX , boxBotomLimitY, ClassID, ClassScore]
The model outputs the top “num_boxes” results ordered in terms of the “ClassScore”.
Thank you for your effort!
OK. then I will look into how to add the feature as a special option.
Well if all these models have a dynamic output they can not be used as a .tflite model on an Android platform. So the conversion to .tflite would not be very useful without a static output.
Knowing that there are models for which it is very difficult to convert their output to be static, what if this conversion option is offered for the many models that do not have a lot of processing behind the NMS block but not necessarily for the other models?
Basically we have a parameter that allows us to fix the output size for the majority of simpler models (in regards to their NMS aspect) so they can be used as .tflite models on Androids. And to let the users know what to expect when they use this parameter a warning is displayed that lets them know that not all models can reasonably have their output fixed. This way you have most utility embedded in your library while avoiding the problems of misuse.