TensorRT: INT8 inference in TensorRT 8.0 get wrong answer
I want to convert pytorch model to TensorRT to do INT8 inference, then I do pytorch model -> onnx model -> trt engine, and in TensorRT 7.2.2.3, I succeed.
I set fp_mode and int8_mode with old version as
builder.fp16_mode=True
builder.int8_mode=True
in int8_mode, I feed test data to calibrate, and finally I bulid fp32 engine, fp16 engine, int8 engine, and I get right accuracy in all the three mode.
Now I want to apply QAT model to TensorRT, and I update pytorch to 1.8.0, TensorRT to 8.0, cuda 10.2.89, cudnn 8.2.0,
first I do INT8 inference in TensorRT as above, but the old version for setting fp16 and int8 mode cannot used, so I use as
config.set_flag(trt.BuilderFlag.FP16)
config.set_flag(trt.BuilderFlag.INT8)
I cannot find pytorch int8 inference sample so I reference int8_caffe_mnist sample (https://github.com/NVIDIA/TensorRT/blob/master/samples/python/int8_caffe_mnist) to calibrate my test data with Int8EntropyCalibrator2 method.
config.int8_calibrator = calib
I bulid trt engine successfully and get the right accuracy with fp32 and fp16 mode, but wrong accuracy in in8 mode, 0.01 acc for 100-classification, I check the engine output, and find the matrix values are almost about 0.002.
Then I use pytorch_quantization toolkit, do PTQ
collect_stats(model, data_loader, num_batches=2)
compute_amax(model, method="percentile", percentile=99.99)
and I export model to onnx, paser it, build int8 engine for inference, I also get the 0.01 accuracy.
the difference between int8 mode and fp16 mode is the config.set_flag and config.int8_calibrator in int8 mode.
Why I get right accuracy in fp16 mode but 0.01 acc in int8 mode?
I don’t know what’s the problem, calibration method? calibration batch? @ttyio appreciate your reply, thanks a lot!
Environment: TensorRT Version: 8.0.0.3 NVIDIA GPU:Tesla P40 NVIDIA Driver Version: CUDA Version: 10.2.89 CUDNN Version: 8.2.0 Operating System: Ubuntu 19.10
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 38
@Mahsa1994 Could you share the exported ONNX file? Thanks
@Ricardosuzaku @aojue1109 Does this issue still exist with latest TRT release? If it does, we will debug it. Thanks