onnx-tensorrt: Unsupported ONNX data type: UINT8 (2)

Folowing the tutorial from the notebook https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/ConvertingSSDMobilenetToONNX.ipynb I am trying to work with a mobilenetv2 and v3 frozen models from tensorflow frozen_inference_graph.pb or a saved_model.pb to convert to ONNX and to TensorRT files. Under NGC dockers 20.01-tf1-py3 and 19.05-py3 I am using both this and tensorflow-onnx projects. I alwaysget different issues, the furthest I got was under 20.01-tf1-py3 with both onnx-tensorrt and tensorflow-onnx on master branchs and install the projects from source. I was able to create the .onnx file, but when I try to create the .trt file I get the following.

onnx2trt /media/bnascimento/project/frozen_inference_graph.onnx -o /media/bnascimento/project/frozen_inference_graph.trt
----------------------------------------------------------------
Input filename:   /media/bnascimento/project/frozen_inference_graph.onnx
ONNX IR version:  0.0.6
Opset version:    10
Producer name:    tf2onnx
Producer version: 1.6.0
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
Parsing model
Unsupported ONNX data type: UINT8 (2)
ERROR: image_tensor:0:190 In function importInput:
[8] Assertion failed: convertDtype(onnxDtype.elem_type(), &trtDtype)

I suspect this has to do with the input tensor for the image, but I dont know how to avoid this issue. Anyone with similar issues before?

Cheers Bruno

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 15
  • Comments: 83

Most upvoted comments

I have the same error with my code. I find a tool can solve the problem here. I find the way here.

  1. Install ONNX Graphsurgeon API
$ sudo apt-get install python3-pip libprotobuf-dev protobuf-compiler
$ git clone https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/tools/onnx-graphsurgeon/
$ make install
  1. Modify your model
import onnx_graphsurgeon as gs
import onnx
import numpy as np

graph = gs.import_onnx(onnx.load("model.onnx"))
for inp in graph.inputs:
    inp.dtype = np.float32

onnx.save(gs.export_onnx(graph), "updated_model.onnx")

Here are the steps I did, but ended up failing anyway.

Step 1: fix UINT8 error

Here is a script that generates a new frozen inference graph with float inputs from one with int inputs:

Suppose it’s called fix_uint8.py. Its usage is: python fix_uint8.py frozen_inference_graph.pb fixed_inference_graph.pb

import tensorflow as tf
import graphsurgeon as gs
import sys

graph = gs.DynamicGraph(sys.argv[1])
image_tensor = graph.find_nodes_by_name('image_tensor')

print('Found Input: ', image_tensor)

cast_node = graph.find_nodes_by_name('Cast')[0] #Replace Cast with ToFloat if using tensorflow <1.15
print('Old field', cast_node.attr['SrcT'])

cast_node.attr['SrcT'].type=1 #Changing Expected type to float
print('New field', cast_node.attr['SrcT'])

input_node = gs.create_plugin_node(name='InputNode', op='Placeholder', shape=(-1, -1, -1, 3), dtype=tf.float32)
namespace_plugin_map = {'image_tensor': input_node}
graph.collapse_namespaces(namespace_plugin_map)
graph.write(sys.argv[2])

Step 2: generate ONNX file from fixed .pb file

Let’s say I fixed a file and called it mobilenet_v2_0.35_128.pb. I then call tf2onnx on this file:

python -m tf2onnx.convert --input mobilenet_v2_0.35_128.pb --inputs InputNode:0 --output mobilenet_v2_0.35_128.onnx --opset 11 --outputs detection_boxes:0,detection_scores:0,detection_multiclass_scores:0,detection_classes:0,num_detections:0,raw_detection_boxes:0,raw_detection_scores:0

2020-08-31 05:32:04,426 - INFO - Using tensorflow=1.15.0, onnx=1.7.0, tf2onnx=1.6.3/d4abc8
2020-08-31 05:32:04,426 - INFO - Using opset <onnx, 11>
2020-08-31 05:32:10,228 - INFO - Optimizing ONNX model
2020-08-31 05:32:28,812 - INFO - After optimization: BatchNormalization -53 (60->7), Cast -34 (131->97), Const -578 (916->338), Gather +6 (29->35), Identity -129 (130->1), Less -2 (10->8), Mul -2 (37->35), Reshape -15 (45->30), Shape -8 (33->25), Slice -7 (56->49), Squeeze -22 (73->51), Transpose -272 (291->19), Unsqueeze -63 (102->39)
2020-08-31 05:32:28,896 - INFO -
2020-08-31 05:32:28,896 - INFO - Successfully converted TensorFlow model mobilenet_v2_0.35_128.pb to ONNX
2020-08-31 05:32:28,925 - INFO - ONNX model is saved at mobilenet_v2_0.35_128.onnx

Step 3: generate TensorRT “engine” from ONNX file

Lastly, I call onnx2trt:

onnx2trt mobilenet_v2_0.35_128.onnx -o mobilenet_v2_0.35_128_engine.trt
----------------------------------------------------------------
Input filename:   mobilenet_v2_0.35_128.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    tf2onnx
Producer version: 1.6.3
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
Parsing model
[2020-08-31 08:27:24 WARNING] [TRT]/home/user/Code/onnx-tensorrt/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[2020-08-31 08:27:24 WARNING] [TRT]/home/user/Code/onnx-tensorrt/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[2020-08-31 08:27:24 WARNING] [TRT]/home/user/Code/onnx-tensorrt/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[2020-08-31 08:27:24   ERROR] INVALID_ARGUMENT: getPluginCreator could not find plugin NonMaxSuppression version 1
While parsing node number 306 [Loop -> "unused_loop_output___73"]:
ERROR: /home/user/Code/onnx-tensorrt/builtin_op_importers.cpp:3713 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

I’ve trained my network using TF 1.15, but I get this error even when I execute these steps with either TF 2.3 or 1.15.

@cognitiveRobot I ditched TensorRT and the Jetson and did inference in an Intel NUC, directly in the CPU.

???

???

Late to this thread, but it looks like there’s a few issues:

  1. UINT8 support - we do not natively support this datatype in TensorRT. It looks like in the attached models above the input is casted to a different type right away, meaning that the potential WAR of just casting the input type to float may be the correct one in this case.
  2. NMS - we currently do not support the ONNX definition of this operator in TensorRT. We are working on getting an official implementation in.
  3. CumSum - this operator has been added and is available on the master branch of the onnx-tensorrt repo. Try building the on the latest commit and importing your model again

Hey guys I too had this same problem and maybe this script can help as it helped me

import onnx

def change_input_datatype(model, typeNdx): # values for typeNdx # 1 = float32 # 2 = uint8 # 3 = int8 # 4 = uint16 # 5 = int16 # 6 = int32 # 7 = int64 inputs = model.graph.input for input in inputs: input.type.tensor_type.elem_type = typeNdx dtype = input.type.tensor_type.elem_type

def change_input_batchsize(model, batchSize): inputs = model.graph.input for input in inputs:
dim1 = input.type.tensor_type.shape.dim[0]
dim1.dim_value = batchSize #print("input: ", input) # uncomment to see input layer details

def change_output_batchsize(model, batchSize):
outputs = model.graph.output for output in outputs:
dim1 = output.type.tensor_type.shape.dim[0]
dim1.dim_value = batchSize #print("output: ", output) #uncomment to see output layer details

onnx_model = onnx.load(<path to your original onnx model file>)

change_input_datatype(onnx_model, 1) change_input_batchsize(onnx_model, 1) change_output_batchsize(onnx_model, 1)

onnx.save(onnx_model, <path to your edited onnx model file>)

Here we can change the data type of the input tensor. Resource: https://forums.developer.nvidia.com/t/unsupported-onnx-data-type-uint8-2/75044/16?u=karanprojectx

First, I have to say that I haven’t had this janky experience with software in years. Working with this ONNX and TensorRT ecosystem is a complete nightmare.

Second, I was able to solve the UINT8 problem by using the code from this NVIDIA Developers forum post: https://forums.developer.nvidia.com/t/problem-converting-onnx-model-to-tensorrt-engine-for-ssd-mobilenet-v2/139337/16

This fixes the original frozen_inference_graph.pb file, which then needs to be converted to ONNX and then to TensorRT.

Well, I have ran more tests… looks like you right about volume sent to GPU being the same.

I’m still not sure where actual conversion took place, but calling inputs[0].host[:allocate_place] = input.flatten(order="C") with uint8 dtype seems to write data to either part of preallocated buffer, either indeed calls type cast under the hood, but amount of data sent to GPU is same for both cases, with a little increase in uint8 case, due to increased CPU performance you pointed out before.

I have measured pcie<=>GPU Tx/Rx speeds with nvidia-smi dmon -s t and there is actual 4x difference when I’m using onnxruntime-gpu with model with uint8 inputs vs TensorRT with or without proposed fix.

Though now more questions arise in my usage scenario: actual performance gain per GPU in my case is about 5-10%, which might be explained by your previous tests. But it shouldn’t lead to overall performance increase of up to 2x in multiple GPU scenarios, especially that with this fix CPU usage actually had grown along with performance.

+1 for natively supporting UINT8. It’s really bizarre that the format used in almost all image source data is not supported.

@cognitiveRobot oh boy oh boy, do I have answers for you. I trained all MobileNetV2 and V3 models from this page with a width multiplier of 1 or less to detect a single class (soccer balls). I then collected the mean inference time for a single frame on a 30 second video, both in a Tesla V100 GPU and an Intel i5-4210U. You can see the results below.

The i5 is between 1.3 and 1.5 times slower than the V100, but you have to be aware that this depends a lot on the implementation. The TF Object Detection API is pretty fast for inference in CPUs. On the other hand, the official YOLOv4 has an inference time of 50 ms on the V100 and a whooping 5 seconds on our feeble CPU.

image

As for the inference time when processing images of different sizes:

  • 1920x1080: ~85 ms
  • 1280x720: ~70 ms
  • 640x480: ~59 ms
  • 480x360: ~56 ms

Just bear in mind that the MobileNets already scale down images before processing them, so it may be a good idea for you to configure your camera/input feed to have low resolutions too. It should matter little for the network.

Very similar problem with the CumSum operator on a PyTorch RoBERTa implementation, exported at ONNX opsec 11:

import onnx
import onnxruntime
import onnx_tensorrt.backend as backend
model = onnx.load('/workspace/models/onnx-my-32.model')
engine = backend.prepare(model)
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 1115620964
[libprotobuf WARNING /workspace/TensorRT/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING /workspace/TensorRT/build/third_party.protobuf/src/third_party.protobuf/src/google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1115620964
[TensorRT] WARNING: /workspace/TensorRT/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting
to cast down to INT32.
[TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin CumSum version 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/TensorRT/parsers/onnx/onnx_tensorrt/backend.py", line 254, in prepare
    return TensorRTBackendRep(model, device, **kwargs)
  File "/workspace/TensorRT/parsers/onnx/onnx_tensorrt/backend.py", line 92, in __init__
    raise RuntimeError(msg)

The error: [TensorRT] ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin CumSum version 1

Reassigning elem_type like @absolution747 pointed out does not solve this, only removes the INT64 warning.

I meet same problem. This is my solution. When call forward method of RoBERTa, if position_ids is None, HuggingFace will call create_position_ids_from_input_ids method modeling RoBERTa to generate position_ids. Inside this function, they use torch.cumsum method. In order to fix bug involve TensorRT not support convert CumSum operator, you need to generate position_ids.

Would be nice if NVIDIA made this easier. Many people are using TF Object Detection. I’m trying to run it on Jetson.

cc @deadeyegoodwin