onnxruntime: [ErrorCode:Fail] Load model from [...]\latin_ipa_forward.onnx failed:invalid vector subscript

Describe the issue

I am trying to use DeepPhonemizer (in Python) from C#. To achieve that, I’ve converted the PyTorch model file (latin_ipa_forward.pt) to onnx, with two custom opset operations: aten::unflatten and aten::scaled_dot_product_attention.

Here’s the resulting conversion code, extension changed from .py to .txt: ToOnnx.txt

import onnxscript
import torch

# Assuming you use opset18
from onnxscript.onnx_opset import opset18 as op

custom_opset = onnxscript.values.Opset(domain="torch.onnx", version=18)

# Registering custom operation for scaled dot product attention
@onnxscript.script(custom_opset)
def ScaledDotProductAttention(
    query,
    key,
    value,
    dropout_p,
):
    # Swap the last two axes of key
    key_shape = op.Shape(key)
    key_last_dim = key_shape[-1:]
    key_second_last_dim = key_shape[-2:-1]
    key_first_dims = key_shape[:-2]
    # Contract the dimensions that are not the last two so we can transpose
    # with a static permutation.
    key_squeezed_shape = op.Concat(
        op.Constant(value_ints=[-1]), key_second_last_dim, key_last_dim, axis=0
    )
    key_squeezed = op.Reshape(key, key_squeezed_shape)
    key_squeezed_transposed = op.Transpose(key_squeezed, perm=[0, 2, 1])
    key_transposed_shape = op.Concat(key_first_dims, key_last_dim, key_second_last_dim, axis=0)
    key_transposed = op.Reshape(key_squeezed_transposed, key_transposed_shape)

    embedding_size = op.CastLike(op.Shape(query)[-1], query)
    scale = op.Div(1.0, op.Sqrt(embedding_size))

    # Scale q, k before matmul for stability see https://tinyurl.com/sudb9s96 for math
    query_scaled = op.Mul(query, op.Sqrt(scale))
    key_transposed_scaled = op.Mul(key_transposed, op.Sqrt(scale))
    attn_weight = op.Softmax(
        op.MatMul(query_scaled, key_transposed_scaled),
        axis=-1,
    )
    attn_weight, _ = op.Dropout(attn_weight, dropout_p)
    return op.MatMul(attn_weight, value)

def custom_scaled_dot_product_attention(g, query, key, value, attn_mask, dropout, is_causal, scale=None):
    return g.onnxscript_op(ScaledDotProductAttention, query, key, value, dropout).setType(query.type())

torch.onnx.register_custom_op_symbolic(
    symbolic_name="aten::scaled_dot_product_attention",
    symbolic_fn=custom_scaled_dot_product_attention,
    opset_version=18,
)

# Registering custom operation for unflatten
@onnxscript.script(custom_opset)
def aten_unflatten(self, dim, sizes):
    """unflatten(Tensor(a) self, int dim, SymInt[] sizes) -> Tensor(a)"""

    self_size = op.Shape(self)

    if dim < 0:
        # PyTorch accepts negative dim as reversed counting
        self_rank = op.Size(self_size)
        dim = self_rank + dim

    head_start_idx = op.Constant(value_ints=[0])
    head_end_idx = op.Reshape(dim, op.Constant(value_ints=[1]))
    head_part_rank = op.Slice(self_size, head_start_idx, head_end_idx)

    tail_start_idx = op.Reshape(dim + 1, op.Constant(value_ints=[1]))
    #tail_end_idx = op.Constant(value_ints=[_INT64_MAX])
    tail_end_idx = op.Constant(value_ints=[9223372036854775807]) # = sys.maxint, exactly 2^63 - 1 -> 64 bit int
    tail_part_rank = op.Slice(self_size, tail_start_idx, tail_end_idx)

    final_shape = op.Concat(head_part_rank, sizes, tail_part_rank, axis=0)

    return op.Reshape(self, final_shape)

def custom_unflatten(g, self, dim, shape):
    return g.onnxscript_op(aten_unflatten, self, dim, shape).setType(self.type().with_sizes([32, 32, 1536]))   

torch.onnx.register_custom_op_symbolic(
    symbolic_name="aten::unflatten",
    symbolic_fn=custom_unflatten,
    opset_version=18,
)


########## Custom ops ready, time to convert the model to onnx

from dp.model.model import load_checkpoint

model, checkpoint = load_checkpoint('latin_ipa_forward.pt')

dummy_input = {"batch": {"text":torch.rand((32,32)).long()}}

torch.onnx.export(
    model,                      # PyTorch Model
    dummy_input,                # Input tensor
    "latin_ipa_forward.onnx",   # Output file (eg. 'output_model.onnx')
    custom_opsets = {"torch.onnx": 18},           
    opset_version=18,           # Operator support version
    input_names=['embedding'],  # Input tensor name (arbitary) -> (embedding): Embedding(84, 512)
    output_names=['fc_out']     # Output tensor name (arbitary) ->  (fc_out): Linear(in_features=512, out_features=82, bias=True)
)

The resulting model goes from 70MB to 50MB, still I won’t attach it unless required.

When I try to create an inference session I get this error message:

Exception thrown: 'Microsoft.ML.OnnxRuntime.OnnxRuntimeException' in Microsoft.ML.OnnxRuntime.dll
An unhandled exception of type 'Microsoft.ML.OnnxRuntime.OnnxRuntimeException' occurred in Microsoft.ML.OnnxRuntime.dll
[ErrorCode:Fail] Load model from [path\to]\latin_ipa_forward.onnx failed:invalid vector subscript

To reproduce

Try to create an inference session with the model:

using InferenceSession session = new InferenceSession(@“path\to\latin_ipa_forward.onnx”);

Urgency

No response

Platform

Windows

OS Version

10 Pro

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.14.1 NuGet Package

ONNX Runtime API

C#

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

@justinchuby, Probably but I use today’s master branch. Overall the problem is that ORT doesn’t generate meaning error message so that user and dev must dive into the very deep place to understand the situation. Basically, the entire function_utils.cc can a similar problem. I will try to improve the calls around at(...). In the meanwhile, we learned that at(...) (and all built-in C++ error checking mechanism) doesn’t generate actionable error message — because the meaning of an exception depends on its context and C++ classes don’t have that context.