TensorRT: TensorRT QAT model is slower than PTQ model !!!

Description

Yolov8m TensorRT QAT model is slower than PTQ model

Environment

TensorRT Version: 8.4.1.5 NVIDIA GPU: RTX2080 NVIDIA Driver Version:

CUDA Version: CUDA11.1 CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

PTQ inter time: 69d1c9f9d5548dfe1462bfb3cc30ba5

QAT infer time: 70bd9864e94965abedf1f716116ef2f

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 20

Most upvoted comments

It is quite common that TensorRT QAT model is slower than PTQ model . Maybe Q-DQ not set right, so fusion bad.