TensorRT: Set layer precision still doesn't take effect in TensorRT 8.6.1.
Description
As I had reflected in this Skipping tactic 0x0000000000000000 due to Myelin error" degrade performance.,set layer precision may failed in TensorRT 8.4.3 due to the ConstShuffleFusion.
In these days, I try TensorRT 8.6.1, but it seems that setting layer precision may still fail due to the ConstShuffleFusion.
For example, as show in the graph, the Max op take a const input named “phase0_tf/predict_node/y:0”, and the value seem to be fp16 subnormal, so I use set_precision api to set the layer (“phase0_tf/predict_node/y:0”) to fp32 explicitly.
The verbose logs are as follows:
When the fp16 subnormal is not set to fp32, the log are as follows, the layer “phase0_tf/predict_node/y:0 + (Unnamed Layer* 522) [Shuffle]” is fp16 precision:
However, when the fp16 subnormal is set to fp32, the layer “phase0_tf/predict_node/y:0 + (Unnamed Layer* 522) [Shuffle]” is still fp16.
By the way, the ConstShuffleFusion produce two kind of layer, such as
I am confused the differences. Is that the reason set_precision fails of the layer “phase0_tf/predict_node/y:0”?
Looking forward to your reply. Thanks a lot!
Environment
TensorRT Version: 8.6.1
NVIDIA GPU: T4
NVIDIA Driver Version: 510
CUDA Version:12.0
CUDNN Version:
Operating System: Ubuntu20.04
Python Version (if applicable):
Tensorflow Version (if applicable): 1.4
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt
):
About this issue
- Original URL
- State: open
- Created 10 months ago
- Comments: 20
Several things I would try:
set_output_type()
) to FP32.If the ForeignNode optimization is triggered, we do not have information about the detailed dtype info. We will need to use Nsys to look at it (or use
--dumpLayerInfo --profilingVerbosity=detailed
with latest TRT internal build).I think the first thing we should do is to repro the accuracy difference between pure-FP32 and FP32+FP16.