TensorRT: TensorRT 8.6.1.6 fails when building engine from ONNX with dynamic shapes on RTX 3070
Description
I’m trying to build an engine from an .onnx file with dynamic shapes. I’m setting the dimensions of the profile as follows:
nvinfer1::IOptimizationProfile *profile = builder->createOptimizationProfile();
profile->setDimensions("input", nvinfer1::OptProfileSelector::kMIN, nvinfer1::Dims4(1, 3, 512, 512));
profile->setDimensions("input", nvinfer1::OptProfileSelector::kOPT, nvinfer1::Dims4(8, 3, 512, 512));
profile->setDimensions("input", nvinfer1::OptProfileSelector::kMAX, nvinfer1::Dims4(8, 3, 704, 704));
config->addOptimizationProfile(profile);
And I get the error:
2: [graphOptimizer.cpp::nvinfer1::builder::ConvDeconvEltwiseSumFusion<class nvinfer1::builder::ConvolutionNode>::matchDetails::4465] Error Code 2: Internal Error (Assertion residual->extent == output->extent failed. )
The same code worked with a previous TensorRT version: 8.4.1.5.
If I set:
profile->setDimensions("input", nvinfer1::OptProfileSelector::kMAX, nvinfer1::Dims4(8, 3, 512, 512));
So that kMIN=kOPT=kMAX
, then it works with the newer tensorRT version. Also if I run trtexec it works (although there’s a segfault at the end)
./trtexec --onnx=D:\329d40cca01d46169e352e30e09c9ce7.onnx --minShapes=input:1x3x512x512 --optShapes=input:1x3x512x512 --maxShapes=input:1x3x704x704 --explicitBatch --saveEngine=output.trt
...
[06/05/2023-13:40:43] [I] [TRT] Total Host Persistent Memory: 331888
[06/05/2023-13:40:43] [I] [TRT] Total Device Persistent Memory: 4045312
[06/05/2023-13:40:43] [I] [TRT] Total Scratch Memory: 107796480
[06/05/2023-13:40:43] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 15 MiB, GPU 6363 MiB
[06/05/2023-13:40:44] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 561.349ms to assign 24 blocks to 392 nodes requiring 232075264 bytes.
[06/05/2023-13:40:44] [I] [TRT] Total Activation Memory: 232075264
[06/05/2023-13:40:44] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +1, GPU +421, now: CPU 1, GPU 421 (MiB)
[06/05/2023-13:40:44] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[06/05/2023-13:40:44] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[06/05/2023-13:40:44] [I] Engine built in 129.782 sec.
[06/05/2023-13:40:45] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 27553, GPU 2176 (MiB)
Segmentation fault
Environment
TensorRT Version: 8.6.1.6
NVIDIA GPU: RTX 3070
NVIDIA Driver Version: 516.94
CUDA Version: 11.6
CUDNN Version: 8.9.0.131
Operating System: Windows 10
Any idea of what could be happening?
Hopefully I have provided enough information, and if not don’t hesitate to ask for more.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16
Can you reproduce it with trtexec? would be great if you can share a reproduce with us. Thanks!