TensorRT: Fake quantization ONNX model parse ERROR using TensorRT7.2

Description

Error occurred parsing fake quantization ONNX model using TensorRT7.2.1.6 following the guidance of pytorch-quantization toolbox provided in TensorRT7.2 release.

Error Message:

Loading ONNX file from path checkpoints/rfdn_asx4_nf64nm2inc3_calibrated_op10.onnx...
Beginning ONNX file parsing
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
ERROR: Failed to parse the ONNX file.
In node 8 (importDequantizeLinear): INVALID_NODE: Assertion failed: K == scale.count()
Traceback (most recent call last):
  File "qaonnx2trt.py", line 65, in <module>
    with get_engine(onnx_file_path, engine_file_path) as engine, engine.create_execution_context() as context:
AttributeError: __enter__

Environment

TensorRT Version: 7.2.1.6 GPU Type: NVIDIA RTX 2070 Nvidia Driver Version: 440.33.01 CUDA Version: CUDA 10.2 CUDNN Version: CUDNN 8.0 Operating System + Version: Ubuntu 1604 Python Version (if applicable): 3.6.12 TensorFlow Version (if applicable): PyTorch Version (if applicable): 1.6.0 Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.) onnx model code

Steps To Reproduce

Please include:

  • Download onnx model and code to disk
  • use the code to build tensorrt engine from onnx model

ERROR:

root@sobey:/project/tensorrt-quantize/test# PYTHONPATH=../ python qaonnx2trt.py 
Loading ONNX file from path checkpoints/rfdn_asx4_nf64nm2inc3_calibrated_op10.onnx...
Beginning ONNX file parsing
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
[TensorRT] ERROR: QuantizeLinear_7_quantize_scale_node: shift weights has count 64 but 3 was expected
ERROR: Failed to parse the ONNX file.
In node 8 (importDequantizeLinear): INVALID_NODE: Assertion failed: K == scale.count()
Traceback (most recent call last):
  File "qaonnx2trt.py", line 65, in <module>
    with get_engine(onnx_file_path, engine_file_path) as engine, engine.create_execution_context() as context:
AttributeError: __enter__

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 30

Most upvoted comments

@ShiinaMitsuki @k9ele7en @maoxiaoming86 Did you guys pass this issue? I would love to know. Thank you.

Hello @ShiinaMitsuki , thanks for reporting. The full support for the onnx model exported from pytorch-quantization tool then import into ONNX-trt will be available in next major release. before that we have to use setDynamicRange to import ONNX int8 network. there is a sample DemoBERT use this method: see load_onnx_weights_and_quant in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L478 see set_dynamic_range in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L113