TensorRT: TensorRT QAT model is slower than PTQ model !!!
Description
Yolov8m TensorRT QAT model is slower than PTQ model
Environment
TensorRT Version: 8.4.1.5 NVIDIA GPU: RTX2080 NVIDIA Driver Version:
CUDA Version: CUDA11.1 CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt
):
PTQ inter time:
QAT infer time:
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 20
It is quite common that TensorRT QAT model is slower than PTQ model . Maybe Q-DQ not set right, so fusion bad.