TensorRT: ONNX networks can't use INT8 calibration and batching

Description

This is due to mutually incompatible changes in the TRT7 release:

https://docs.nvidia.com/deeplearning/sdk/tensorrt-release-notes/tensorrt-7.html

ONNX parser with dynamic shapes support The ONNX parser supports full-dimensions mode only. Your network definition must be created with the explicitBatch flag set.

versus

Known Issues The INT8 calibration does not work with dynamic shapes. To workaround this issue, ensure there are two passes in the code: Using a fixed shape input to build the engine in the first pass, allows TensorRT to generate the calibration cache.

This means the ONNX network must be exported at a fixed batch size in order to get INT8 calibration working, but now it’s no longer possible to specify the batch size. I also verified that manually fixing up the inputs with setDimensions(…-1…) does not work, you will hit an assertion mg.nodes[mg.regionIndices[outputRegion]].size ==mg.nodes[mg.regionIndices[inputRegion]].size while building.

One would think there might be sort of a workaround by exporting two different networks, one with a fixed batch size and a second one with a dynamic_axis, and then using the calibration from one for the other. However, even here there are severe pitfalls: a calibration cache that is generated for, say, batch_size=1 won’t necessarily work for larger batch sizes, presumably because they will generate a different convolution strategy that causes different accuracy issues. Edit: This might’ve been another issue.

Lastly, the calibrator itself appears to be using implicit batch sizes, and breaks on batch size > 1 as follows:

TRT: Starting Calibration with batch size 16. Calibrated 16 images. TRT: Explicit batch network detected and batch size specified, use execute without batch size instead. TRT: C:\source\builder\cudnnCalibrator.cpp (707) - Cuda Error in nvinfer1::builder::Histogram::add: 700 (an illegal memory access was encountered) TRT: FAILED_ALLOCATION: Unknown exception TRT: C:\source\builder\cudnnCalibrator.cpp (703) - Cuda Error in nvinfer1::builder::Histogram::add: 700 (an illegal memory access was encountered) TRT: FAILED_ALLOCATION: Unknown exception TRT: C:\source\rtSafe\cuda\caskConvolutionRunner.cpp (233) - Cuda Error in nvinfer1::rt::task::CaskConvolutionRunner::allocateContextResources: 700 (an illegal memory access was encountered) TRT: FAILED_EXECUTION: Unknown exception TRT: Calibrated batch 0 in 2.62865 seconds. Cuda failure: 700

with batch_size == 1, it’s also hitting assertions:

TRT: Explicit batch network detected and batch size specified, use execute without batch size instead. TRT: Assertion failed: d.nbDims >= 1 C:\source\rtSafe\safeHelpers.cpp:419 Aborting…

The combination of all these failures means that you can’t really use ONNX networks in INT8 mode, at least the “Using a fixed shape input to build the engine in the first pass” recommendation hits all kinds of internal assertions as you can see above.

Environment

TensorRT Version: 7.0.0.11 GPU Type: RTX 2080 Nvidia Driver Version: 441.22 CUDA Version: 10.2 CUDNN Version: 7.6.0.5 Operating System + Version: Windows 10 Python Version (if applicable): 3.6 TensorFlow Version (if applicable): PyTorch Version (if applicable): 1.3 stable Baremetal or Container (if container which image + tag): bare

Relevant Files

Steps To Reproduce

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 6
  • Comments: 38

Most upvoted comments

Hi @gcp,

Sorry for the delay, I’m on holiday and was hoping to do this in my free time but it’s still been a busy holiday 😅

I made a little sample workflow to demonstrate how I believe this works.

1. Export trained model to 2 ONNX models (one fixed batch, one dynamic batch)

I tweaked the Alexnet demo from here: https://pytorch.org/docs/stable/onnx.html

import torch
import torchvision

dummy_input = torch.randn(10, 3, 224, 224, device='cuda')
model = torchvision.models.alexnet(pretrained=True).cuda()

input_names = [ "actual_input_1" ] #+ [ "learned_%d" % i for i in range(16) ]
output_names = [ "output1" ]

# Fixed Shape
torch.onnx.export(model, dummy_input, "alexnet_fixed.onnx", verbose=True, opset_version=11,
                  input_names=input_names, output_names=output_names)

# Dynamic Shape
dynamic_axes = dict(zip(input_names, [{0:'batch_size'} for i in range(len(input_names))]))
print(dynamic_axes)
torch.onnx.export(model, dummy_input, "alexnet_dynamic.onnx", verbose=True, opset_version=11,
                  input_names=input_names, output_names=output_names,
                  dynamic_axes=dynamic_axes)

2. Do INT8 calibration on fixed shape model and save calibration cache

This is based on code from here on 20.01 branch: https://github.com/rmccorm4/tensorrt-utils/blob/20.01/classification/imagenet/onnx_to_tensorrt.py

# Fixed batch model
$ python onnx_to_tensorrt.py --fp16 --int8 \
    --calibration-cache=alexnet.cache \
    --calibration-data=/imagenet/val \
    --preprocess_func=preprocess_imagenet \
    --explicit-batch \
    --onnx=../../../alexnet_fixed.onnx

2019-12-30 00:20:15 - __main__ - INFO - TRT_LOGGER Verbosity: Severity.ERROR
2019-12-30 00:20:15 - __main__ - INFO - Using FP16 build flag
2019-12-30 00:20:15 - __main__ - INFO - Using INT8 build flag
2019-12-30 00:20:15 - utils - INFO - Collecting calibration files from: /imagenet/val
2019-12-30 00:20:21 - utils - INFO - Number of Calibration Files found: 50000
2019-12-30 00:20:21 - utils - WARNING - Capping number of calibration images to max_calibration_size: 512
2019-12-30 00:20:22 - __main__ - DEBUG - network.get_input(0).shape = (10, 3, 224, 224)
2019-12-30 00:20:22 - __main__ - DEBUG - network.get_input(0).name = actual_input_1
2019-12-30 00:20:22 - __main__ - INFO - Explicit batch size is fixed (10), creating one optimization profile...
2019-12-30 00:20:22 - __main__ - INFO - Optimization profile: Min(10, 3, 224, 224), Opt(10, 3, 224, 224), Max(10, 3, 224, 224)
2019-12-30 00:20:22 - __main__ - INFO - Building Engine...
2019-12-30 00:20:25 - ImagenetCalibrator - INFO - Calibration images pre-processed: 32/512
2019-12-30 00:20:26 - processing - DEBUG - Received grayscale image. Reshaped to (3, 224, 224)
2019-12-30 00:20:27 - ImagenetCalibrator - INFO - Calibration images pre-processed: 64/512
2019-12-30 00:20:28 - ImagenetCalibrator - INFO - Calibration images pre-processed: 96/512
2019-12-30 00:20:30 - ImagenetCalibrator - INFO - Calibration images pre-processed: 128/512
2019-12-30 00:20:30 - processing - DEBUG - Received grayscale image. Reshaped to (3, 224, 224)
2019-12-30 00:20:31 - ImagenetCalibrator - INFO - Calibration images pre-processed: 160/512
2019-12-30 00:20:32 - ImagenetCalibrator - INFO - Calibration images pre-processed: 192/512
2019-12-30 00:20:32 - processing - DEBUG - Received grayscale image. Reshaped to (3, 224, 224)
2019-12-30 00:20:32 - processing - DEBUG - Received grayscale image. Reshaped to (3, 224, 224)
2019-12-30 00:20:32 - processing - DEBUG - Received grayscale image. Reshaped to (3, 224, 224)
2019-12-30 00:20:33 - ImagenetCalibrator - INFO - Calibration images pre-processed: 224/512
2019-12-30 00:20:33 - processing - DEBUG - Received grayscale image. Reshaped to (3, 224, 224)
2019-12-30 00:20:34 - ImagenetCalibrator - INFO - Calibration images pre-processed: 256/512
2019-12-30 00:20:36 - ImagenetCalibrator - INFO - Calibration images pre-processed: 288/512
2019-12-30 00:20:37 - ImagenetCalibrator - INFO - Calibration images pre-processed: 320/512
2019-12-30 00:20:38 - processing - DEBUG - Received grayscale image. Reshaped to (3, 224, 224)
2019-12-30 00:20:38 - ImagenetCalibrator - INFO - Calibration images pre-processed: 352/512
2019-12-30 00:20:40 - ImagenetCalibrator - INFO - Calibration images pre-processed: 384/512
2019-12-30 00:20:41 - ImagenetCalibrator - INFO - Calibration images pre-processed: 416/512
2019-12-30 00:20:41 - processing - DEBUG - Received grayscale image. Reshaped to (3, 224, 224)
2019-12-30 00:20:42 - processing - DEBUG - Received grayscale image. Reshaped to (3, 224, 224)
2019-12-30 00:20:42 - ImagenetCalibrator - INFO - Calibration images pre-processed: 448/512
2019-12-30 00:20:44 - ImagenetCalibrator - INFO - Calibration images pre-processed: 480/512
2019-12-30 00:20:45 - ImagenetCalibrator - INFO - Calibration images pre-processed: 512/512
2019-12-30 00:20:45 - ImagenetCalibrator - INFO - Caching calibration data for future use: alexnet.cache  <-------- # Calibration cache saved here
2019-12-30 00:20:55 - __main__ - INFO - Writing engine to model.engine

3. Use saved calibration cache on dynamic shape model to create int8 dynamic engine

# Dynamic batch model
$ python onnx_to_tensorrt.py --fp16 --int8 \
    --calibration-cache=alexnet.cache \
    --calibration-data=/imagenet/val \
    --preprocess_func=preprocess_imagenet \
    --explicit-batch \
    --onnx=../../../alexnet_dynamic.onnx

2019-12-30 00:27:57 - __main__ - INFO - TRT_LOGGER Verbosity: Severity.ERROR
2019-12-30 00:27:58 - __main__ - INFO - Using FP16 build flag
2019-12-30 00:27:58 - __main__ - INFO - Using INT8 build flag
2019-12-30 00:27:58 - __main__ - INFO - Skipping calibration files, using calibration cache: alexnet.cache
2019-12-30 00:27:58 - __main__ - DEBUG - network.get_input(0).shape = (-1, 3, 224, 224)
2019-12-30 00:27:58 - __main__ - DEBUG - network.get_input(0).name = actual_input_1
2019-12-30 00:27:58 - __main__ - INFO - Explicit batch size is dynamic (-1), creating several optimization profiles...
2019-12-30 00:27:58 - __main__ - INFO - Optimization profile: Min(1, 3, 224, 224), Opt(1, 3, 224, 224), Max(1, 3, 224, 224)
2019-12-30 00:27:58 - __main__ - INFO - Optimization profile: Min(2, 3, 224, 224), Opt(2, 3, 224, 224), Max(2, 3, 224, 224)
2019-12-30 00:27:58 - __main__ - INFO - Optimization profile: Min(4, 3, 224, 224), Opt(4, 3, 224, 224), Max(4, 3, 224, 224)
2019-12-30 00:27:58 - __main__ - INFO - Optimization profile: Min(8, 3, 224, 224), Opt(8, 3, 224, 224), Max(8, 3, 224, 224)
2019-12-30 00:27:58 - __main__ - INFO - Optimization profile: Min(16, 3, 224, 224), Opt(16, 3, 224, 224), Max(16, 3, 224, 224)
2019-12-30 00:27:58 - __main__ - INFO - Optimization profile: Min(32, 3, 224, 224), Opt(32, 3, 224, 224), Max(32, 3, 224, 224)
2019-12-30 00:27:58 - __main__ - INFO - Building Engine...
2019-12-30 00:27:59 - ImagenetCalibrator - INFO - Using calibration cache to save time: alexnet.cache
2019-12-30 00:27:59 - ImagenetCalibrator - INFO - Using calibration cache to save time: alexnet.cache
2019-12-30 00:28:41 - __main__ - INFO - Writing engine to model.engine

4. Smoke test inference on TRT engine

I’m hitting an OOM error on this part and don’t have time to debug right now, but hopefully this helps.

I have no idea about 1), I didn’t write onnx2trt.

As for (2), I’m not sure how to explain it better than I already did. If you enable INT8 mode, TensorRT will call the getBatch() function in the calibrator you specified, and when that finishes, it will call the writeCalibrationCache function with a length and a pointer to a buffer. What you do with that data is up to you but you probably want to write it to disk.

Next launch, TensorRT will call readCalibrationCache in your calibrator. You have to pass it a buffer with the data you got previously from writeCalibrationCache and set the length argument to the size of the buffer.

(Technically readCalibrationCache will have been called the first time too, but you won’t have had anything to return from it yet, so you’d have returned nullptr).

I think I’m just repeating what I wrote here: https://github.com/NVIDIA/TensorRT/issues/289#issuecomment-571214138

The point I was trying to get across (and clearly failed to do) is that you don’t make a call to get the calibration cache from TensorRT. TensorRT will call your code when it has finished calibrating. readCalibrationCache and writeCalibrationCache are callbacks. Your code provides them, TensorRT will call them when appropriate.

Hi @rmccorm4 Thanks for your reply, I get that the version of pytorch matters in terms of onnx export. However, when I ran onnx_to_tensorrt.py code I still get the following error:

[TensorRT] ERROR: engine.cpp (529) - Cuda Error in commonEmitTensor: 1 (invalid argument)
[TensorRT] ERROR: FAILED_ALLOCATION: std::exception
[TensorRT] ERROR: ../rtSafe/cuda/caskConvolutionRunner.cpp (334) - Cuda Error in execute: 1 (invalid argument)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception

But solved this when I change return 1 in line: 119 fileImagenetCalibrator.py.

I think this explicit batch bug should be fixed because it can be confusing.