TensorRT: ONNX networks can't use INT8 calibration and batching
Description
This is due to mutually incompatible changes in the TRT7 release:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-release-notes/tensorrt-7.html
ONNX parser with dynamic shapes support The ONNX parser supports full-dimensions mode only. Your network definition must be created with the explicitBatch flag set.
versus
Known Issues The INT8 calibration does not work with dynamic shapes. To workaround this issue, ensure there are two passes in the code: Using a fixed shape input to build the engine in the first pass, allows TensorRT to generate the calibration cache.
This means the ONNX network must be exported at a fixed batch size in order to get INT8 calibration working, but now it’s no longer possible to specify the batch size. I also verified that manually fixing up the inputs with setDimensions(…-1…) does not work, you will hit an assertion mg.nodes[mg.regionIndices[outputRegion]].size ==mg.nodes[mg.regionIndices[inputRegion]].size
while building.
One would think there might be sort of a workaround by exporting two different networks, one with a fixed batch size and a second one with a dynamic_axis, and then using the calibration from one for the other. However, even here there are severe pitfalls: a calibration cache that is generated for, say, batch_size=1 won’t necessarily work for larger batch sizes, presumably because they will generate a different convolution strategy that causes different accuracy issues. Edit: This might’ve been another issue.
Lastly, the calibrator itself appears to be using implicit batch sizes, and breaks on batch size > 1 as follows:
TRT: Starting Calibration with batch size 16. Calibrated 16 images. TRT: Explicit batch network detected and batch size specified, use execute without batch size instead. TRT: C:\source\builder\cudnnCalibrator.cpp (707) - Cuda Error in nvinfer1::builder::Histogram::add: 700 (an illegal memory access was encountered) TRT: FAILED_ALLOCATION: Unknown exception TRT: C:\source\builder\cudnnCalibrator.cpp (703) - Cuda Error in nvinfer1::builder::Histogram::add: 700 (an illegal memory access was encountered) TRT: FAILED_ALLOCATION: Unknown exception TRT: C:\source\rtSafe\cuda\caskConvolutionRunner.cpp (233) - Cuda Error in nvinfer1::rt::task::CaskConvolutionRunner::allocateContextResources: 700 (an illegal memory access was encountered) TRT: FAILED_EXECUTION: Unknown exception TRT: Calibrated batch 0 in 2.62865 seconds. Cuda failure: 700
with batch_size == 1, it’s also hitting assertions:
TRT: Explicit batch network detected and batch size specified, use execute without batch size instead. TRT: Assertion failed: d.nbDims >= 1 C:\source\rtSafe\safeHelpers.cpp:419 Aborting…
The combination of all these failures means that you can’t really use ONNX networks in INT8 mode, at least the “Using a fixed shape input to build the engine in the first pass” recommendation hits all kinds of internal assertions as you can see above.
Environment
TensorRT Version: 7.0.0.11 GPU Type: RTX 2080 Nvidia Driver Version: 441.22 CUDA Version: 10.2 CUDNN Version: 7.6.0.5 Operating System + Version: Windows 10 Python Version (if applicable): 3.6 TensorFlow Version (if applicable): PyTorch Version (if applicable): 1.3 stable Baremetal or Container (if container which image + tag): bare
Relevant Files
Steps To Reproduce
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 6
- Comments: 38
Hi @gcp,
Sorry for the delay, I’m on holiday and was hoping to do this in my free time but it’s still been a busy holiday 😅
I made a little sample workflow to demonstrate how I believe this works.
1. Export trained model to 2 ONNX models (one fixed batch, one dynamic batch)
I tweaked the Alexnet demo from here: https://pytorch.org/docs/stable/onnx.html
2. Do INT8 calibration on fixed shape model and save calibration cache
This is based on code from here on 20.01 branch: https://github.com/rmccorm4/tensorrt-utils/blob/20.01/classification/imagenet/onnx_to_tensorrt.py
3. Use saved calibration cache on dynamic shape model to create int8 dynamic engine
4. Smoke test inference on TRT engine
I’m hitting an OOM error on this part and don’t have time to debug right now, but hopefully this helps.
I have no idea about 1), I didn’t write onnx2trt.
As for (2), I’m not sure how to explain it better than I already did. If you enable INT8 mode, TensorRT will call the
getBatch()
function in the calibrator you specified, and when that finishes, it will call thewriteCalibrationCache
function with a length and a pointer to a buffer. What you do with that data is up to you but you probably want to write it to disk.Next launch, TensorRT will call
readCalibrationCache
in your calibrator. You have to pass it a buffer with the data you got previously fromwriteCalibrationCache
and set thelength
argument to the size of the buffer.(Technically
readCalibrationCache
will have been called the first time too, but you won’t have had anything to return from it yet, so you’d have returned nullptr).I think I’m just repeating what I wrote here: https://github.com/NVIDIA/TensorRT/issues/289#issuecomment-571214138
The point I was trying to get across (and clearly failed to do) is that you don’t make a call to get the calibration cache from TensorRT. TensorRT will call your code when it has finished calibrating.
readCalibrationCache
andwriteCalibrationCache
are callbacks. Your code provides them, TensorRT will call them when appropriate.Hi @rmccorm4 Thanks for your reply, I get that the version of pytorch matters in terms of onnx export. However, when I ran
onnx_to_tensorrt.py
code I still get the following error:But solved this when I change
return 1
in line: 119 fileImagenetCalibrator.py
.I think this explicit batch bug should be fixed because it can be confusing.