TensorRT: TensorRT 8.6.1.6 fails when building engine from ONNX with dynamic shapes on RTX 3070

Description

I’m trying to build an engine from an .onnx file with dynamic shapes. I’m setting the dimensions of the profile as follows:

        nvinfer1::IOptimizationProfile *profile = builder->createOptimizationProfile();
        profile->setDimensions("input", nvinfer1::OptProfileSelector::kMIN, nvinfer1::Dims4(1, 3, 512, 512));
        profile->setDimensions("input", nvinfer1::OptProfileSelector::kOPT, nvinfer1::Dims4(8, 3, 512, 512));
        profile->setDimensions("input", nvinfer1::OptProfileSelector::kMAX, nvinfer1::Dims4(8, 3, 704, 704));
        config->addOptimizationProfile(profile);

And I get the error:

2: [graphOptimizer.cpp::nvinfer1::builder::ConvDeconvEltwiseSumFusion<class nvinfer1::builder::ConvolutionNode>::matchDetails::4465] Error Code 2: Internal Error (Assertion residual->extent == output->extent failed. )

The same code worked with a previous TensorRT version: 8.4.1.5.

If I set:

        profile->setDimensions("input", nvinfer1::OptProfileSelector::kMAX, nvinfer1::Dims4(8, 3, 512, 512));

So that kMIN=kOPT=kMAX, then it works with the newer tensorRT version. Also if I run trtexec it works (although there’s a segfault at the end)

./trtexec --onnx=D:\329d40cca01d46169e352e30e09c9ce7.onnx --minShapes=input:1x3x512x512 --optShapes=input:1x3x512x512 --maxShapes=input:1x3x704x704 --explicitBatch --saveEngine=output.trt
...
[06/05/2023-13:40:43] [I] [TRT] Total Host Persistent Memory: 331888
[06/05/2023-13:40:43] [I] [TRT] Total Device Persistent Memory: 4045312
[06/05/2023-13:40:43] [I] [TRT] Total Scratch Memory: 107796480
[06/05/2023-13:40:43] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 15 MiB, GPU 6363 MiB
[06/05/2023-13:40:44] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 561.349ms to assign 24 blocks to 392 nodes requiring 232075264 bytes.
[06/05/2023-13:40:44] [I] [TRT] Total Activation Memory: 232075264
[06/05/2023-13:40:44] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +1, GPU +421, now: CPU 1, GPU 421 (MiB)
[06/05/2023-13:40:44] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[06/05/2023-13:40:44] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[06/05/2023-13:40:44] [I] Engine built in 129.782 sec.
[06/05/2023-13:40:45] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 27553, GPU 2176 (MiB)
Segmentation fault

Environment

TensorRT Version: 8.6.1.6

NVIDIA GPU: RTX 3070

NVIDIA Driver Version: 516.94

CUDA Version: 11.6

CUDNN Version: 8.9.0.131

Operating System: Windows 10

Any idea of what could be happening?

Hopefully I have provided enough information, and if not don’t hesitate to ask for more.

About this issue

Original URL
State: closed
Created a year ago
Comments: 16

Most upvoted comments

2: [graphOptimizer.cpp::nvinfer1::builder::ConvDeconvEltwiseSumFusion<class nvinfer1::builder::ConvolutionNode>::matchDetails::4465] Error Code 2: Internal Error (Assertion residual->extent == output->extent failed. )

Can you reproduce it with trtexec? would be great if you can share a reproduce with us. Thanks!

zerollzeng on Jun 8, 2023