TensorRT: TensorRT engine conversion from ONNX fails
Description
I have an ONNX model developed using fast rcnn with detectron2 framework (https://github.com/facebookresearch/detectron2). The ONNX model works well. However, when I tried to convert the model to TensorRT engine using trtexec I get the following error. Could anyone please tell me how I can get rid of this error? Thanks in advance.
$ usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.engine
[09/13/2023-22:24:32] [I] === Model Options ===
[09/13/2023-22:24:32] [I] Format: ONNX
[09/13/2023-22:24:32] [I] Model: model.onnx
[09/13/2023-22:24:32] [I] Output:
[09/13/2023-22:24:32] [I] === Build Options ===
[09/13/2023-22:24:32] [I] Max batch: explicit batch
[09/13/2023-22:24:32] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[09/13/2023-22:24:32] [I] minTiming: 1
[09/13/2023-22:24:32] [I] avgTiming: 8
[09/13/2023-22:24:32] [I] Precision: FP32
[09/13/2023-22:24:32] [I] LayerPrecisions:
[09/13/2023-22:24:32] [I] Layer Device Types:
[09/13/2023-22:24:32] [I] Calibration:
[09/13/2023-22:24:32] [I] Refit: Disabled
[09/13/2023-22:24:32] [I] Version Compatible: Disabled
[09/13/2023-22:24:32] [I] TensorRT runtime: full
[09/13/2023-22:24:32] [I] Lean DLL Path:
[09/13/2023-22:24:32] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[09/13/2023-22:24:32] [I] Exclude Lean Runtime: Disabled
[09/13/2023-22:24:32] [I] Sparsity: Disabled
[09/13/2023-22:24:32] [I] Safe mode: Disabled
[09/13/2023-22:24:32] [I] Build DLA standalone loadable: Disabled
[09/13/2023-22:24:32] [I] Allow GPU fallback for DLA: Disabled
[09/13/2023-22:24:32] [I] DirectIO mode: Disabled
[09/13/2023-22:24:32] [I] Restricted mode: Disabled
[09/13/2023-22:24:32] [I] Skip inference: Disabled
[09/13/2023-22:24:32] [I] Save engine: model.engine
[09/13/2023-22:24:32] [I] Load engine:
[09/13/2023-22:24:32] [I] Profiling verbosity: 0
[09/13/2023-22:24:32] [I] Tactic sources: Using default tactic sources
[09/13/2023-22:24:32] [I] timingCacheMode: local
[09/13/2023-22:24:32] [I] timingCacheFile:
[09/13/2023-22:24:32] [I] Heuristic: Disabled
[09/13/2023-22:24:32] [I] Preview Features: Use default preview flags.
[09/13/2023-22:24:32] [I] MaxAuxStreams: -1
[09/13/2023-22:24:32] [I] BuilderOptimizationLevel: -1
[09/13/2023-22:24:32] [I] Input(s)s format: fp32:CHW
[09/13/2023-22:24:32] [I] Output(s)s format: fp32:CHW
[09/13/2023-22:24:32] [I] Input build shapes: model
[09/13/2023-22:24:32] [I] Input calibration shapes: model
[09/13/2023-22:24:32] [I] === System Options ===
[09/13/2023-22:24:32] [I] Device: 0
[09/13/2023-22:24:32] [I] DLACore:
[09/13/2023-22:24:32] [I] Plugins:
[09/13/2023-22:24:32] [I] setPluginsToSerialize:
[09/13/2023-22:24:32] [I] dynamicPlugins:
[09/13/2023-22:24:32] [I] ignoreParsedPluginLibs: 0
[09/13/2023-22:24:32] [I]
[09/13/2023-22:24:32] [I] === Inference Options ===
[09/13/2023-22:24:32] [I] Batch: Explicit
[09/13/2023-22:24:32] [I] Input inference shapes: model
[09/13/2023-22:24:32] [I] Iterations: 10
[09/13/2023-22:24:32] [I] Duration: 3s (+ 200ms warm up)
[09/13/2023-22:24:32] [I] Sleep time: 0ms
[09/13/2023-22:24:32] [I] Idle time: 0ms
[09/13/2023-22:24:32] [I] Inference Streams: 1
[09/13/2023-22:24:32] [I] ExposeDMA: Disabled
[09/13/2023-22:24:32] [I] Data transfers: Enabled
[09/13/2023-22:24:32] [I] Spin-wait: Disabled
[09/13/2023-22:24:32] [I] Multithreading: Disabled
[09/13/2023-22:24:32] [I] CUDA Graph: Disabled
[09/13/2023-22:24:32] [I] Separate profiling: Disabled
[09/13/2023-22:24:32] [I] Time Deserialize: Disabled
[09/13/2023-22:24:32] [I] Time Refit: Disabled
[09/13/2023-22:24:32] [I] NVTX verbosity: 0
[09/13/2023-22:24:32] [I] Persistent Cache Ratio: 0
[09/13/2023-22:24:32] [I] Inputs:
[09/13/2023-22:24:32] [I] === Reporting Options ===
[09/13/2023-22:24:32] [I] Verbose: Disabled
[09/13/2023-22:24:32] [I] Averages: 10 inferences
[09/13/2023-22:24:32] [I] Percentiles: 90,95,99
[09/13/2023-22:24:32] [I] Dump refittable layers:Disabled
[09/13/2023-22:24:32] [I] Dump output: Disabled
[09/13/2023-22:24:32] [I] Profile: Disabled
[09/13/2023-22:24:32] [I] Export timing to JSON file:
[09/13/2023-22:24:32] [I] Export output to JSON file:
[09/13/2023-22:24:32] [I] Export profile to JSON file:
[09/13/2023-22:24:32] [I]
[09/13/2023-22:24:32] [I] === Device Information ===
[09/13/2023-22:24:32] [I] Selected Device: NVIDIA GeForce GTX 1050 Ti
[09/13/2023-22:24:32] [I] Compute Capability: 6.1
[09/13/2023-22:24:32] [I] SMs: 6
[09/13/2023-22:24:32] [I] Device Global Memory: 4037 MiB
[09/13/2023-22:24:32] [I] Shared Memory per SM: 96 KiB
[09/13/2023-22:24:32] [I] Memory Bus Width: 128 bits (ECC disabled)
[09/13/2023-22:24:32] [I] Application Compute Clock Rate: 1.43 GHz
[09/13/2023-22:24:32] [I] Application Memory Clock Rate: 3.504 GHz
[09/13/2023-22:24:32] [I]
[09/13/2023-22:24:32] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[09/13/2023-22:24:32] [I]
[09/13/2023-22:24:32] [I] TensorRT version: 8.6.1
[09/13/2023-22:24:32] [I] Loading standard plugins
[09/13/2023-22:24:33] [I] [TRT] [MemUsageChange] Init CUDA: CPU +131, GPU +0, now: CPU 136, GPU 526 (MiB)
[09/13/2023-22:24:38] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +226, GPU +38, now: CPU 439, GPU 556 (MiB)
[09/13/2023-22:24:38] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See “Lazy Loading” section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[09/13/2023-22:24:38] [I] Start parsing network model.
[09/13/2023-22:24:38] [I] [TRT] ----------------------------------------------------------------
[09/13/2023-22:24:38] [I] [TRT] Input filename: model.onnx
[09/13/2023-22:24:38] [I] [TRT] ONNX IR version: 0.0.6
[09/13/2023-22:24:38] [I] [TRT] Opset version: 11
[09/13/2023-22:24:38] [I] [TRT] Producer name: pytorch
[09/13/2023-22:24:38] [I] [TRT] Producer version: 1.13.0
[09/13/2023-22:24:38] [I] [TRT] Domain:
[09/13/2023-22:24:38] [I] [TRT] Model version: 0
[09/13/2023-22:24:38] [I] [TRT] Doc string:
[09/13/2023-22:24:38] [I] [TRT] ----------------------------------------------------------------
[09/13/2023-22:24:38] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/13/2023-22:24:38] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[09/13/2023-22:24:38] [E] Error[4]: /roi_heads/box_pooler/level_poolers.0/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape. Shapes are [-1] and [-1,1].
[09/13/2023-22:24:38] [E] [TRT] ModelImporter.cpp:771: While parsing node number 3518 [If -> “/roi_heads/box_pooler/level_poolers.0/If_output_0”]:
[09/13/2023-22:24:38] [E] [TRT] ModelImporter.cpp:772: — Begin node —
[09/13/2023-22:24:38] [E] [TRT] ModelImporter.cpp:773: input: “/roi_heads/box_pooler/level_poolers.0/Equal_output_0”
output: “/roi_heads/box_pooler/level_poolers.0/If_output_0”
name: “/roi_heads/box_pooler/level_poolers.0/If”
op_type: “If”
attribute {
name: “then_branch”
g {
node {
input: “/roi_heads/box_pooler/level_poolers.0/Gather_output_0”
output: “/roi_heads/box_pooler/level_poolers.0/Squeeze_output_0”
name: “/roi_heads/box_pooler/level_poolers.0/Squeeze”
op_type: “Squeeze”
attribute {
name: “axes”
ints: 1
type: INTS
}
}
name: “torch_jit3”
output {
name: “/roi_heads/box_pooler/level_poolers.0/Squeeze_output_0”
type {
tensor_type {
elem_type: 1
shape {
dim {
dim_param: “Squeeze/roi_heads/box_pooler/level_poolers.0/Squeeze_output_0_dim_0”
}
}
}
}
}
}
type: GRAPH
}
attribute {
name: “else_branch”
g {
node {
input: “/roi_heads/box_pooler/level_poolers.0/Gather_output_0”
output: “/roi_heads/box_pooler/level_poolers.0/Identity_output_0”
name: “/roi_heads/box_pooler/level_poolers.0/Identity”
op_type: “Identity”
}
name: “torch_jit4”
output {
name: “/roi_heads/box_pooler/level_poolers.0/Identity_output_0”
type {
tensor_type {
elem_type: 1
shape {
dim {
dim_param: “Squeeze/roi_heads/box_pooler/level_poolers.0/Squeeze_output_0_dim_0”
}
dim {
dim_value: 1
}
}
}
}
}
}
type: GRAPH
}
[09/13/2023-22:24:38] [E] [TRT] ModelImporter.cpp:774: — End node — [09/13/2023-22:24:38] [E] [TRT] ModelImporter.cpp:777: ERROR: ModelImporter.cpp:195 In function parseGraph: [6] Invalid Node - /roi_heads/box_pooler/level_poolers.0/If /roi_heads/box_pooler/level_poolers.0/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape. Shapes are [-1] and [-1,1]. [09/13/2023-22:24:38] [E] Failed to parse onnx file [09/13/2023-22:24:38] [I] Finished parsing network model. Parse time: 0.334953 [09/13/2023-22:24:38] [E] Parsing model failed [09/13/2023-22:24:38] [E] Failed to create engine from model or file. [09/13/2023-22:24:38] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8601] # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.engine
Environment
TensorRT Version: 8.6.1
NVIDIA GPU: NVIDIA GeForce GTX 1050 Ti
NVIDIA Driver Version: 515.43.04
CUDA Version: 11.7
CUDNN Version:
Operating System: Ubuntu 20.04
Python Version (if applicable): 3.8.10
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 1.13.0+cu117
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt
):
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 16
@RajUpadhyay ,I finally solved my issue. Big thanks to you for helping me out with the faster rcnn model that you posted and the link for the onnx converter. The EfficientNMS_TRT plugin did the trick. The task was not straight forward though as my model’s input is not the same as yours. Also, I do not have the mask head. But your posted faster rcnn model and the converted onnx file helped me to understand and modify the create_onnx.py sample script of TensorRT to create NMS node for EfficientNMS_TRT plugin and replace the output of the my onnx model and once it is converted, I could generate the engine file without any issues.
Thanks again so much for your help! All the best to you.
I actually used mask rcnn. I just converted faster rcnn because architecture wise, it is very similar to mask rcnn. I followed the documentation as it is i.e., the onnx imported from the detectron2 must have a size of (1344, 1344) and stuff.
Since I already shared the converted onnx of the faster rcnn, here is the original model.onnx for faster rcnn.
https://drive.google.com/file/d/1clpBvHzbcG82crUHJ2UZEu_6IJVWMchw/view?usp=sharing
Maybe try constant folding first. and if it doesn’t work you need to modify the original code or ONNX model.