TensorRT: Fake quantization ONNX model parse ERROR using TensorRT7.2
Description
Error occurred parsing fake quantization ONNX model using TensorRT7.2.1.6 following the guidance of pytorch-quantization
toolbox provided in TensorRT7.2 release.
Error Message:
Loading ONNX file from path checkpoints/rfdn_asx4_nf64nm2inc3_calibrated_op10.onnx...
Beginning ONNX file parsing
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
ERROR: Failed to parse the ONNX file.
In node 8 (importDequantizeLinear): INVALID_NODE: Assertion failed: K == scale.count()
Traceback (most recent call last):
File "qaonnx2trt.py", line 65, in <module>
with get_engine(onnx_file_path, engine_file_path) as engine, engine.create_execution_context() as context:
AttributeError: __enter__
Environment
TensorRT Version: 7.2.1.6 GPU Type: NVIDIA RTX 2070 Nvidia Driver Version: 440.33.01 CUDA Version: CUDA 10.2 CUDNN Version: CUDNN 8.0 Operating System + Version: Ubuntu 1604 Python Version (if applicable): 3.6.12 TensorFlow Version (if applicable): PyTorch Version (if applicable): 1.6.0 Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.) onnx model code
Steps To Reproduce
Please include:
- Download
onnx model
andcode
to disk - use the code to build tensorrt engine from onnx model
ERROR:
root@sobey:/project/tensorrt-quantize/test# PYTHONPATH=../ python qaonnx2trt.py
Loading ONNX file from path checkpoints/rfdn_asx4_nf64nm2inc3_calibrated_op10.onnx...
Beginning ONNX file parsing
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
ERROR: Failed to parse the ONNX file.
In node 8 (importDequantizeLinear): INVALID_NODE: Assertion failed: K == scale.count()
Traceback (most recent call last):
File "qaonnx2trt.py", line 65, in <module>
with get_engine(onnx_file_path, engine_file_path) as engine, engine.create_execution_context() as context:
AttributeError: __enter__
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 30
@ShiinaMitsuki @k9ele7en @maoxiaoming86 Did you guys pass this issue? I would love to know. Thank you.
Hello @ShiinaMitsuki , thanks for reporting. The full support for the onnx model exported from pytorch-quantization tool then import into ONNX-trt will be available in next major release. before that we have to use
setDynamicRange
to import ONNX int8 network. there is a sample DemoBERT use this method: seeload_onnx_weights_and_quant
in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L478 seeset_dynamic_range
in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L113