TensorRT: Performance regression on 1080 Ti

Description

Going from 20.11 to 20.12 introduces performance regression on common 3D convolution model.

Environment

TensorRT Version: 7.2.1 -> 7.2.2 NVIDIA GPU: 1080 Ti NVIDIA Driver Version: 460 CUDA Version: 11.1.0 -> 11.1.1 CUDNN Version: 8.0.4 -> 8.0.5 Operating System: Ubuntu 20 Python Version (if applicable): 3.6 -> 3.8 PyTorch Version (if applicable): 1.8.1

Steps To Reproduce

To reproduce, save a 3d model with this script in onnx format :

import torch
import torchvision

dummy_input = torch.randn(4, 3, 35, 224, 224).float().cuda()
model = torchvision.models.video.r2plus1d_18().cuda().eval()

with torch.no_grad():
    torch.onnx.export(
        model,
        dummy_input,
        "resnet.onnx",
        verbose=True,
    )

Then optimize it in the TensorRT docker

/usr/src/tensorrt/bin/trtexec --onnx=resnet.onnx --best --workspace=5000 --saveEngine=resnet.trt --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw

My speedtest shows this results (speeds are in videos / s)

20.11 
1080 Ti : 7.52

20.12
1080 Ti : 6.80

This ~10% regression continues with later versions.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16

Most upvoted comments

I think I was right. I tested 21.06, which has cudnn 8.2.1, and the problem seems solved : 21.06 : 535.982 ms

20.11 : 563.797 ms

(so there’s even a little boost)

Thus closing this.