TensorRT: TensorRT fails to build engine from pytorch_quantization ONNX

Description

I created a quantized model in pytorch using pytorch_quantization and exported it to ONNX. Then, I executed the following command on Jetson Orin:

/usr/src/tensorrt/bin/trtexec --onnx=model_quantized.onnx --int8 --saveEngine=model_quantized.trt

Here is part of the trtexec output that includes the error:

[12/31/2023-11:17:12] [I] Start parsing network model
[12/31/2023-11:17:12] [I] [TRT] ----------------------------------------------------------------
[12/31/2023-11:17:12] [I] [TRT] Input filename:   model_quantized.onnx
[12/31/2023-11:17:12] [I] [TRT] ONNX IR version:  0.0.7
[12/31/2023-11:17:12] [I] [TRT] Opset version:    13
[12/31/2023-11:17:12] [I] [TRT] Producer name:    pytorch
[12/31/2023-11:17:12] [I] [TRT] Producer version: 1.12.1
[12/31/2023-11:17:12] [I] [TRT] Domain:           
[12/31/2023-11:17:12] [I] [TRT] Model version:    0
[12/31/2023-11:17:12] [I] [TRT] Doc string:       
[12/31/2023-11:17:12] [I] [TRT] ----------------------------------------------------------------
[12/31/2023-11:17:12] [W] [TRT] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/31/2023-11:17:13] [I] Finish parsing network model
[12/31/2023-11:17:13] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[12/31/2023-11:17:13] [W] [TRT] Calibrator won't be used in explicit precision mode. Use quantization aware training to generate network with Quantize/Dequantize nodes.
[12/31/2023-11:17:13] [E] Error[2]: [qdqGraphOptimizer.cpp::matchInt8ConstantDQ::3582] Error Code 2: Internal Error (onnx::QuantizeLinear_898: Int8 constant is only allowed before DQ node)
[12/31/2023-11:17:13] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[12/31/2023-11:17:13] [E] Engine could not be created from network
[12/31/2023-11:17:13] [E] Building engine failed
[12/31/2023-11:17:13] [E] Failed to create engine from model or file.
[12/31/2023-11:17:13] [E] Engine set up failed

The error refers to the node QuantizeLinear_898 and the error is Int8 constant is only allowed before DQ node.

Looking at the ONNX graph, I can see that there is a node related to QuantizeLinear_898 that has no input:

Any idea what went wrong and how to solve it?

Environment

Model compilation:

TensorRT Version: TensorRT v8502 (Jetson Orin)

Model quantization and export to ONNX:

OS: Windows 10 Python Version (if applicable): 3.9.12 PyTorch Version (if applicable): 1.12.1+cu116 pytorch_quantization version: 2.1.3