onnxruntime: [ErrorCode:Fail] Load model from [...]\latin_ipa_forward.onnx failed:invalid vector subscript
Describe the issue
I am trying to use DeepPhonemizer (in Python) from C#. To achieve that, I’ve converted the PyTorch model file (latin_ipa_forward.pt) to onnx, with two custom opset operations: aten::unflatten and aten::scaled_dot_product_attention.
Here’s the resulting conversion code, extension changed from .py to .txt: ToOnnx.txt
import onnxscript
import torch
# Assuming you use opset18
from onnxscript.onnx_opset import opset18 as op
custom_opset = onnxscript.values.Opset(domain="torch.onnx", version=18)
# Registering custom operation for scaled dot product attention
@onnxscript.script(custom_opset)
def ScaledDotProductAttention(
query,
key,
value,
dropout_p,
):
# Swap the last two axes of key
key_shape = op.Shape(key)
key_last_dim = key_shape[-1:]
key_second_last_dim = key_shape[-2:-1]
key_first_dims = key_shape[:-2]
# Contract the dimensions that are not the last two so we can transpose
# with a static permutation.
key_squeezed_shape = op.Concat(
op.Constant(value_ints=[-1]), key_second_last_dim, key_last_dim, axis=0
)
key_squeezed = op.Reshape(key, key_squeezed_shape)
key_squeezed_transposed = op.Transpose(key_squeezed, perm=[0, 2, 1])
key_transposed_shape = op.Concat(key_first_dims, key_last_dim, key_second_last_dim, axis=0)
key_transposed = op.Reshape(key_squeezed_transposed, key_transposed_shape)
embedding_size = op.CastLike(op.Shape(query)[-1], query)
scale = op.Div(1.0, op.Sqrt(embedding_size))
# Scale q, k before matmul for stability see https://tinyurl.com/sudb9s96 for math
query_scaled = op.Mul(query, op.Sqrt(scale))
key_transposed_scaled = op.Mul(key_transposed, op.Sqrt(scale))
attn_weight = op.Softmax(
op.MatMul(query_scaled, key_transposed_scaled),
axis=-1,
)
attn_weight, _ = op.Dropout(attn_weight, dropout_p)
return op.MatMul(attn_weight, value)
def custom_scaled_dot_product_attention(g, query, key, value, attn_mask, dropout, is_causal, scale=None):
return g.onnxscript_op(ScaledDotProductAttention, query, key, value, dropout).setType(query.type())
torch.onnx.register_custom_op_symbolic(
symbolic_name="aten::scaled_dot_product_attention",
symbolic_fn=custom_scaled_dot_product_attention,
opset_version=18,
)
# Registering custom operation for unflatten
@onnxscript.script(custom_opset)
def aten_unflatten(self, dim, sizes):
"""unflatten(Tensor(a) self, int dim, SymInt[] sizes) -> Tensor(a)"""
self_size = op.Shape(self)
if dim < 0:
# PyTorch accepts negative dim as reversed counting
self_rank = op.Size(self_size)
dim = self_rank + dim
head_start_idx = op.Constant(value_ints=[0])
head_end_idx = op.Reshape(dim, op.Constant(value_ints=[1]))
head_part_rank = op.Slice(self_size, head_start_idx, head_end_idx)
tail_start_idx = op.Reshape(dim + 1, op.Constant(value_ints=[1]))
#tail_end_idx = op.Constant(value_ints=[_INT64_MAX])
tail_end_idx = op.Constant(value_ints=[9223372036854775807]) # = sys.maxint, exactly 2^63 - 1 -> 64 bit int
tail_part_rank = op.Slice(self_size, tail_start_idx, tail_end_idx)
final_shape = op.Concat(head_part_rank, sizes, tail_part_rank, axis=0)
return op.Reshape(self, final_shape)
def custom_unflatten(g, self, dim, shape):
return g.onnxscript_op(aten_unflatten, self, dim, shape).setType(self.type().with_sizes([32, 32, 1536]))
torch.onnx.register_custom_op_symbolic(
symbolic_name="aten::unflatten",
symbolic_fn=custom_unflatten,
opset_version=18,
)
########## Custom ops ready, time to convert the model to onnx
from dp.model.model import load_checkpoint
model, checkpoint = load_checkpoint('latin_ipa_forward.pt')
dummy_input = {"batch": {"text":torch.rand((32,32)).long()}}
torch.onnx.export(
model, # PyTorch Model
dummy_input, # Input tensor
"latin_ipa_forward.onnx", # Output file (eg. 'output_model.onnx')
custom_opsets = {"torch.onnx": 18},
opset_version=18, # Operator support version
input_names=['embedding'], # Input tensor name (arbitary) -> (embedding): Embedding(84, 512)
output_names=['fc_out'] # Output tensor name (arbitary) -> (fc_out): Linear(in_features=512, out_features=82, bias=True)
)
The resulting model goes from 70MB to 50MB, still I won’t attach it unless required.
When I try to create an inference session I get this error message:
Exception thrown: 'Microsoft.ML.OnnxRuntime.OnnxRuntimeException' in Microsoft.ML.OnnxRuntime.dll
An unhandled exception of type 'Microsoft.ML.OnnxRuntime.OnnxRuntimeException' occurred in Microsoft.ML.OnnxRuntime.dll
[ErrorCode:Fail] Load model from [path\to]\latin_ipa_forward.onnx failed:invalid vector subscript
To reproduce
Try to create an inference session with the model:
using InferenceSession session = new InferenceSession(@“path\to\latin_ipa_forward.onnx”);
Urgency
No response
Platform
Windows
OS Version
10 Pro
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.14.1 NuGet Package
ONNX Runtime API
C#
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 15 (11 by maintainers)
@justinchuby, Probably but I use today’s master branch. Overall the problem is that ORT doesn’t generate meaning error message so that user and dev must dive into the very deep place to understand the situation. Basically, the entire
function_utils.cccan a similar problem. I will try to improve the calls aroundat(...). In the meanwhile, we learned thatat(...)(and all built-in C++ error checking mechanism) doesn’t generate actionable error message — because the meaning of an exception depends on its context and C++ classes don’t have that context.