TensorRT: Performance regression on 1080 Ti
Description
Going from 20.11 to 20.12 introduces performance regression on common 3D convolution model.
Environment
TensorRT Version: 7.2.1 -> 7.2.2 NVIDIA GPU: 1080 Ti NVIDIA Driver Version: 460 CUDA Version: 11.1.0 -> 11.1.1 CUDNN Version: 8.0.4 -> 8.0.5 Operating System: Ubuntu 20 Python Version (if applicable): 3.6 -> 3.8 PyTorch Version (if applicable): 1.8.1
Steps To Reproduce
To reproduce, save a 3d model with this script in onnx format :
import torch
import torchvision
dummy_input = torch.randn(4, 3, 35, 224, 224).float().cuda()
model = torchvision.models.video.r2plus1d_18().cuda().eval()
with torch.no_grad():
torch.onnx.export(
model,
dummy_input,
"resnet.onnx",
verbose=True,
)
Then optimize it in the TensorRT docker
/usr/src/tensorrt/bin/trtexec --onnx=resnet.onnx --best --workspace=5000 --saveEngine=resnet.trt --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw
My speedtest shows this results (speeds are in videos / s)
20.11
1080 Ti : 7.52
20.12
1080 Ti : 6.80
This ~10% regression continues with later versions.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16
I think I was right. I tested 21.06, which has cudnn 8.2.1, and the problem seems solved : 21.06 : 535.982 ms
20.11 : 563.797 ms
(so there’s even a little boost)
Thus closing this.