onnxruntime: onnx.load() | DecodeError: Error parsing message

Bug issue.

Goal: re-develop this BERT Notebook to use textattack/albert-base-v2-MRPC.

Kernel: conda_pytorch_p36. Deleted all output files and did Restart & Run All.

I can successfully create and save an ONNX model from HuggingFace Transformers model in run time memory. Error occurs when onnx.load(), from storage into memory.

Are my ONNX files corrupted?

albert.onnx and alber.opt.onnx here.


Section 2.1 - export in-memory PyTorch model as ONNX model:

import onnxruntime

def export_onnx_model(args, model, tokenizer, onnx_model_path):
    with torch.no_grad():
        inputs = {'input_ids':      torch.ones(1,128, dtype=torch.int64),
                    'attention_mask': torch.ones(1,128, dtype=torch.int64),
                    'token_type_ids': torch.ones(1,128, dtype=torch.int64)}
        outputs = model(**inputs)

        symbolic_names = {0: 'batch_size', 1: 'max_seq_len'}
        torch.onnx.export(model,                                            # model being run
                    (inputs['input_ids'],                             # model input (or a tuple for multiple inputs)
                    inputs['attention_mask'], 
                    inputs['token_type_ids']),                                         # model input (or a tuple for multiple inputs)
                    onnx_model_path,                                # where to save the model (can be a file or file-like object)
                    opset_version=11,                                 # the ONNX version to export the model to
                    do_constant_folding=True,                         # whether to execute constant folding for optimization
                    input_names=['input_ids',                         # the model's input names
                                'input_mask', 
                                'segment_ids'],
                    output_names=['output'],                    # the model's output names
                    dynamic_axes={'input_ids': symbolic_names,        # variable length axes
                                'input_mask' : symbolic_names,
                                'segment_ids' : symbolic_names})
        logger.info("ONNX Model exported to {0}".format(onnx_model_path))

export_onnx_model(configs, model, tokenizer, "albert.onnx")

Then optimisation:

pip install torch_optimizer
import torch_optimizer as optim

optimizer = optim.DiffGrad(model.parameters(), lr=0.001)
optimizer.step()

torch.save(optimizer.state_dict(), 'albert.opt.onnx')

Section 2.2 Quantize ONNX model:

from onnxruntime.quantization import quantize_dynamic, QuantType
import onnx

def quantize_onnx_model(onnx_model_path, quantized_model_path):    
    onnx_opt_model = onnx.load(onnx_model_path)  # DecodeError
    quantize_dynamic(onnx_model_path,
                     quantized_model_path,
                     weight_type=QuantType.QInt8)

    logger.info(f"quantized model saved to:{quantized_model_path}")

quantize_onnx_model('albert.opt.onnx', 'albert.opt.quant.onnx')

print('ONNX full precision model size (MB):', os.path.getsize("albert.opt.onnx")/(1024*1024))
print('ONNX quantized model size (MB):', os.path.getsize("albert.opt.quant.onnx")/(1024*1024))

Traceback:

---------------------------------------------------------------------------
DecodeError                               Traceback (most recent call last)
<ipython-input-16-2d2d32b0a667> in <module>
     10     logger.info(f"quantized model saved to:{quantized_model_path}")
     11 
---> 12 quantize_onnx_model('albert.opt.onnx', 'albert.opt.quant.onnx')
     13 
     14 print('ONNX full precision model size (MB):', os.path.getsize("albert.opt.onnx")/(1024*1024))

<ipython-input-16-2d2d32b0a667> in quantize_onnx_model(onnx_model_path, quantized_model_path)
      3 
      4 def quantize_onnx_model(onnx_model_path, quantized_model_path):
----> 5     onnx_opt_model = onnx.load(onnx_model_path)
      6     quantize_dynamic(onnx_model_path,
      7                      quantized_model_path,

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnx/__init__.py in load_model(f, format, load_external_data)
    119     '''
    120     s = _load_bytes(f)
--> 121     model = load_model_from_string(s, format=format)
    122 
    123     if load_external_data:

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnx/__init__.py in load_model_from_string(s, format)
    156     Loaded in-memory ModelProto
    157     '''
--> 158     return _deserialize(s, ModelProto())
    159 
    160 

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnx/__init__.py in _deserialize(s, proto)
     97                          '\ntype is {}'.format(type(proto)))
     98 
---> 99     decoded = cast(Optional[int], proto.ParseFromString(s))
    100     if decoded is not None and decoded != len(s):
    101         raise google.protobuf.message.DecodeError(

DecodeError: Error parsing message

Output Files:

albert.onnx  # original save
albert.opt.onnx  # optimised version save

Please let me know if there’s anything else I can add to post.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (3 by maintainers)

Most upvoted comments

@danielbellhv, you can try the following command instead since you want to run MRPC: python -m onnxruntime.transformers.benchmark -m albert-base-v2 -i 1 -t 100 -b 1 -s 128 -e onnxruntime --model_class AutoModelForSequenceClassification -p int8 -o -v

Try python -m onnxruntime.transformers.benchmark --help for more information about the parameter.

Related onnx export code can be found in https://github.com/microsoft/onnxruntime/blob/4af116649c8f5f6e725ce8b314b7f8e38007f236/onnxruntime/python/tools/transformers/onnx_exporter.py#L347