TensorRT: Set layer precision still doesn't take effect in TensorRT 8.6.1.

Description

As I had reflected in this Skipping tactic 0x0000000000000000 due to Myelin error" degrade performance.，set layer precision may failed in TensorRT 8.4.3 due to the ConstShuffleFusion.

In these days, I try TensorRT 8.6.1, but it seems that setting layer precision may still fail due to the ConstShuffleFusion. For example, as show in the graph, the Max op take a const input named “phase0_tf/predict_node/y:0”, and the value seem to be fp16 subnormal, so I use set_precision api to set the layer (“phase0_tf/predict_node/y:0”) to fp32 explicitly.

The verbose logs are as follows:

When the fp16 subnormal is not set to fp32, the log are as follows, the layer “phase0_tf/predict_node/y:0 + (Unnamed Layer* 522) [Shuffle]” is fp16 precision:

However, when the fp16 subnormal is set to fp32, the layer “phase0_tf/predict_node/y:0 + (Unnamed Layer* 522) [Shuffle]” is still fp16.

By the way, the ConstShuffleFusion produce two kind of layer, such as

I am confused the differences. Is that the reason set_precision fails of the layer “phase0_tf/predict_node/y:0”?

Looking forward to your reply. Thanks a lot!

Environment

TensorRT Version: 8.6.1

NVIDIA GPU: T4

NVIDIA Driver Version: 510

CUDA Version:12.0

CUDNN Version:

Operating System: Ubuntu20.04

Python Version (if applicable):

Tensorflow Version (if applicable): 1.4

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

About this issue

Original URL
State: open
Created 10 months ago
Comments: 20

Most upvoted comments

Several things I would try:

set the precision of the Concat op before the Max op to FP32 and also set the Concat’s output dtype (using set_output_type()) to FP32.
If that doesn’t work, add a “Cast” op before Max to cast the Concat’s output to FP32, before feeding into Max.

On the right part of the image, it’s a myelin subgraph, is it possible that myelin already set the precision to FP32 but just didn’t print it in the log?

If the ForeignNode optimization is triggered, we do not have information about the detailed dtype info. We will need to use Nsys to look at it (or use --dumpLayerInfo --profilingVerbosity=detailed with latest TRT internal build).

I think the first thing we should do is to repro the accuracy difference between pure-FP32 and FP32+FP16.

nvpohanh on Aug 24, 2023