TensorRT: Significant Floating Point Errors in Container Versions 23.03 to 23.08(starting from TensorRT 8.6.x) Affecting Specific Models when running on all GPUs including T4, A100.

Description

Reference: https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html

Description: Up to container version 23.02(TensorRT 8.5.x), there were no issues running our company’s models. However, starting from version 23.03 up to 23.08(8.6.x), we’ve consistently encountered large floating point errors in specific models.

Specifics:

When the TensorRT model is built with batch_size=1, the error does not occur. The issue manifests consistently when the TensorRT model is built with batch_size=2 or higher, rendering the model unusable. The TensorRT models are constructed from ONNX, and we’ve verified that the issue is not related to the opset. --fp16

Given that the error compromises model integrity, immediate attention is requested.

Environment

TensorRT Version: All version of 8.6.x. NGC Container 23.04~23.08.

NVIDIA GPU: T4, A100

NVIDIA Driver Version: 535.104.05

CUDA Version: 12.2

CUDNN Version: x

Operating System:

Container (if so, version): NGC Container from 23.03 and 23.08. https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorrt

Relevant Files

Can’t include any model files.

Steps To Reproduce

Build TensorRT Model from ONNX model which larger than 200M parameters with as possible as big batch-size.

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 19

Most upvoted comments

LayerNormalization is suspicious layer for precision drop

Yes layernorm is prone to overflow under FP16, so fallback it back to FP32 is a good solution. You should be able to see the warning in TRT log.

Sometime the diff just accumulate and it’s unavoidable, could you please try fallback some layers back to FP32? this can be done in try-and-test until you find a good balance between performance and accuracy.

Check onnx with polygraphy, looks like the output is totally matched:

[I] Accuracy Comparison | trt-runner-N0-10/22/23-03:14:59 vs. onnxrt-runner-N0-10/22/23-03:14:59
[I]     Comparing Output: 'x' (dtype=int64, shape=(2, 999)) with 'x' (dtype=int64, shape=(2, 999))
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
/home/scratch.zeroz_sw/miniconda3/lib/python3.9/site-packages/polygraphy/comparator/compare.py:308: RuntimeWarning: invalid value encountered in true_divide
  reldiff = absdiff / np.abs(cast_up_out1)
[I]         trt-runner-N0-10/22/23-03:14:59: x | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0), max=0 at (0, 0), avg-magnitude=0
[I]         onnxrt-runner-N0-10/22/23-03:14:59: x | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0), max=0 at (0, 0), avg-magnitude=0
[I]         Error Metrics: x
[I]             Minimum Required Tolerance: elemwise error | [abs=0] OR [rel=nan] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0), max=0 at (0, 0), avg-magnitude=0
[I]             Relative Difference | Stats: mean=nan, std-dev=nan, var=nan, median=nan, min=nan at (0, 0), max=nan at (0, 0), avg-magnitude=nan
[I]         PASSED | Output: 'x' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['x']
[I] Accuracy Summary | trt-runner-N0-10/22/23-03:14:59 vs. onnxrt-runner-N0-10/22/23-03:14:59 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 248.438s | Command: /home/scratch.zeroz_sw/miniconda3/bin/polygraphy run triton_model_dir/conformer_20/1/model_20.sim.onnx --trt --fp16 --onnxrt --input-shapes source:[2,160000] wav_lens:[2,1]

Check the reproduce you provided, its it possible for you to provide the onnx model? I want to confirm that this issue is come from TRT or Triton, while in the latter case you have seek help from Triton developers.

Send the private link here and I’ll request for access 😃