TensorRT: ONNX to TRT Error: Myelin Error in addNodeToMyelinGraph: 0 ... operation not supported within a loop body.

Description

I wrote in Keras a custom model that takes an RGB-Video (i.e. a 4D Tensor) as input to classifiy it.

list_convolved_frames = []
input = tf.keras.Input(shape=(num_frames,*input_shape_frame))
for i in range(num_frames):
            out = input[:,i,:,:,:] 
            out = do_something(out)
            out = Lambda(lambda x: tf.keras.backend.expand_dims(x,1))(out)
            list_convolved_frames.append(out)

convolved_frames = Concatenate(axis=1)(list_convolved_frames)

out = LSTM(64,return_sequences=False,dropout=dropout_rate)(convolved_frames)
out = Flatten()(out)
out   = Dense(2, activation='softmax')(out)

model = tf.keras.Model(inputs=input, outputs=out, name=model_name)

I can successfully convert it to ONNX, however conversion to TensorRT fails with

[02/23/2021-19:13:53] [E] [TRT] ../builder/myelin/codeGenerator.cpp (114) - Myelin Error in addNodeToMyelinGraph: 0 (while/TensorArrayV2Read/TensorListGetItem{StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/zeros_1/Const:0,const_fold_opt__733,__inference_while_cond_45765_532_while/Less,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/while/maximum_iterations:0,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/time:0,Func/StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/input/_43:0,Func/StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/input/_44:0,Func/StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/input/_45:0,while/add_2/y:0,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_9/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_8/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_7/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_6/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_5/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_4/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_3/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_2/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda_1/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lambda/ExpandDims,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/concatenate/concat,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/transpose,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/zeros_1/Const:0_0 + (Unnamed Layer* 531) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/zeros_1,(Unnamed Layer* 541) [TripLimit],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/while_loop,(Unnamed Layer* 554) [Recurrence],(Unnamed Layer* 556) [Recurrence],(Unnamed Layer* 558) [Recurrence],(Unnamed Layer* 546) [TripLimit],while/add_2,(Unnamed Layer* 565) [Shuffle],while/TensorArrayV2Read/TensorListGetItem,while/MatMul,(Unnamed Layer* 549) [Recurrence],(Unnamed Layer* 550) [Recurrence],while/MatMul_1,while/add,while/BiasAdd,while/split,while/split_1,while/split_2,while/split_3,while/Sigmoid,while/Sigmoid_1,while/Tanh,while/Sigmoid_2,while/mul,while/mul_1,while/add_1,while/Tanh_1,while/mul_2,while/TensorArrayV2Write/TensorListSetItem,(Unnamed Layer* 596) [LoopOutput],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/strided_slice_2,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/lstm/PartitionedCall/strided_slice_2__676 + StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/flatten/Reshape + (Unnamed Layer* 633) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense/MatMul,(Unnamed Layer* 638) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense/BiasAdd/ReadVariableOp:0 + (Unnamed Layer* 640) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense/BiasAdd,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/re_lu_6/Relu,(Unnamed Layer* 649) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/MatMul,(Unnamed Layer* 654) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/BiasAdd/ReadVariableOp:0 + (Unnamed Layer* 656) [Shuffle],StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/BiasAdd,StatefulPartitionedCall/CUSTOM_TD_CNN_MODEL_CLASS_02/dense_1/Softmax} operation not supported within a loop body.) [02/23/2021-19:13:53] [E] [TRT] ../builder/myelin/codeGenerator.cpp (114) - Myelin Error in addNodeToMyelinGraph: 0 () [02/23/2021-19:13:53] [E] Engine creation failed [02/23/2021-19:13:53] [E] Engine set up failed

Any idea whats going on?

Btw, I highly suspect that my problem is related to: https://github.com/NVIDIA/TensorRT/issues/411

Environment

TensorRT Version: 7.2.1.4 NVIDIA GPU: RTX 2080 NVIDIA Driver Version: 455.23.05 CUDA Version: 11.1 CUDNN Version: Operating System: Ubuntu 18.04 Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:20.10-py3

Relevant Files

Steps To Reproduce

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 23

Most upvoted comments

@joan126 No, still waiting for the bug resolution from NVIDIA/TensorRT.

Hello @ttyio. I upgraded to TensorRT 8.2.0.6 and still have issue with the same simple model conversion

from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
import tf2onnx

input= Input(shape = (60,8), dtype=tf.float32)

lstm = tf.keras.layers.LSTM(1)(input)
model = Model(inputs=input, outputs=lstm)
spec = (tf.TensorSpec((None, 60, 8), tf.float32),)

tf2onnx.convert.from_keras(model, output_path='model_dummy.onnx', input_signature=spec)

With the following depencencies Python 3.6.9 TensorRT 8.2.0.6 TensorFlow 2.4.0 tf2onnx 1.9.1

When I try converting from ONNX to TRT: trtexec --optShapes='args_0':1x60x8 --onnx=model_dummy.onnx

The log shows

[11/09/2021-13:06:59] [I] [TRT] [MemUsageChange] Init CUDA: CPU +322, GPU +0, now: CPU 334, GPU 2009 (MiB)
[11/09/2021-13:06:59] [I] Start parsing network model
[11/09/2021-13:06:59] [I] [TRT] ----------------------------------------------------------------
[11/09/2021-13:06:59] [I] [TRT] Input filename:   model_dummy.onnx
[11/09/2021-13:06:59] [I] [TRT] ONNX IR version:  0.0.6
[11/09/2021-13:06:59] [I] [TRT] Opset version:    11
[11/09/2021-13:06:59] [I] [TRT] Producer name:    tf2onnx
[11/09/2021-13:06:59] [I] [TRT] Producer version: 1.9.1
[11/09/2021-13:06:59] [I] [TRT] Domain:           
[11/09/2021-13:06:59] [I] [TRT] Model version:    0
[11/09/2021-13:06:59] [I] [TRT] Doc string:       
[11/09/2021-13:06:59] [I] [TRT] ----------------------------------------------------------------
[11/09/2021-13:06:59] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/09/2021-13:06:59] [W] [TRT] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[11/09/2021-13:06:59] [I] Finish parsing network model
[11/09/2021-13:06:59] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 451 MiB, GPU 2029 MiB
[11/09/2021-13:07:00] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.1 but loaded cuBLAS/cuBLAS LT 11.5.1
[11/09/2021-13:07:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +483, GPU +206, now: CPU 934, GPU 2235 (MiB)
[11/09/2021-13:07:00] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +393, GPU +180, now: CPU 1327, GPU 2415 (MiB)
[11/09/2021-13:07:00] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/09/2021-13:07:00] [I] [TRT] [BlockAssignment] Algorithm Linear took 0.000402ms to assign 1 blocks to 1 nodes requiring 16777216 bytes.
[11/09/2021-13:07:00] [I] [TRT] Total Activation Memory: 16777216
[11/09/2021-13:07:00] [I] [TRT] Detected 1 inputs and 1 output network tensors.
trtexec: /root/gpgpu/MachineLearning/myelin/src/compiler/ir/operation.cpp:396: void myelin::ir::operation_t::replace_def(myelin::ir::tensor_t*, size_t): Assertion `idx < out_tensors().size()' failed.
Aborted (core dumped)

I have the same logs when I use opset 9 or 10.

Here is the ONNX model: model_dummy.onnx.tar.gz

Hi @ttyio , the steps to reproduce the issue is simple. First here is my dependencies. JetPack 4.5.1 Python 3.6.9 TensorRT 7.1.3 TensorFlow 2.4.0 tf2onnx 1.8.4

Then the code

import tensorflow as tf
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
import tf2onnx

input= Input(shape = (60,8), dtype=tf.float32)

lstm = tf.keras.layers.LSTM(1)(input)
model = Model(inputs=input, outputs=lstm)
spec = (tf.TensorSpec((None, 60, 8), tf.float32),)

tf2onnx.convert.from_keras(model, output_path='model_dummy.onnx', input_signature=spec)`

I use trtexec from TensorRT OSS 7.1.3 to parse/verify the onnx: trtexec --optShapes='args_0':1x60x8 --onnx=model_dummy.onnx

Which results in

[04/28/2021-14:06:45] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/28/2021-14:06:45] [I] [TRT] 
[04/28/2021-14:06:45] [I] [TRT] --------------- Layers running on DLA: 
[04/28/2021-14:06:45] [I] [TRT] 
[04/28/2021-14:06:45] [I] [TRT] --------------- Layers running on GPU: 
[04/28/2021-14:06:45] [I] [TRT] {(Unnamed Layer* 0) [Constant],(Unnamed Layer* 1) [Constant],while_cond_567_while/Less,(Unnamed Layer* 14) [Constant],(Unnamed Layer* 17) [Constant],(Unnamed Layer* 23) [Constant],(Unnamed Layer* 25) [Constant],(Unnamed Layer* 27) [Constant],(Unnamed Layer* 29) [Constant],model/lstm/PartitionedCall/transpose,(Unnamed Layer* 9) [Constant] + (Unnamed Layer* 10) [Shuffle],model/lstm/zeros_1,(Unnamed Layer* 15) [TripLimit],model/lstm/PartitionedCall/while_loop,(Unnamed Layer* 24) [Recurrence],(Unnamed Layer* 26) [Recurrence],(Unnamed Layer* 28) [Recurrence],(Unnamed Layer* 16) [TripLimit],while/add_2,while/TensorArrayV2Read/TensorListGetItem,(Unnamed Layer* 35) [Shuffle],while/MatMul,(Unnamed Layer* 19) [Recurrence],(Unnamed Layer* 20) [Recurrence],while/MatMul_1,while/add,while/BiasAdd,while/split,while/split_1,while/split_2,while/split_3,while/Sigmoid,while/Sigmoid_1,while/Tanh,while/Sigmoid_2,while/mul,while/mul_1,while/add_1,while/Tanh_1,while/mul_2,while/Identity_4,(Unnamed Layer* 66) [LoopOutput],model/lstm/PartitionedCall/strided_slice_2,model/lstm/PartitionedCall/strided_slice_2__23}, 
[04/28/2021-14:06:46] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[04/28/2021-14:06:46] [E] [TRT] ../builder/myelin/codeGenerator.cpp (112) - Myelin Error in addNodeToMyelinGraph: 0 (while/TensorArrayV2Read/TensorListGetItem{(Unnamed Layer* 0) [Constant],(Unnamed Layer* 1) [Constant],while_cond_567_while/Less,(Unnamed Layer* 14) [Constant],(Unnamed Layer* 17) [Constant],(Unnamed Layer* 23) [Constant],(Unnamed Layer* 25) [Constant],(Unnamed Layer* 27) [Constant],(Unnamed Layer* 29) [Constant],model/lstm/PartitionedCall/transpose,(Unnamed Layer* 9) [Constant] + (Unnamed Layer* 10) [Shuffle],model/lstm/zeros_1,(Unnamed Layer* 15) [TripLimit],model/lstm/PartitionedCall/while_loop,(Unnamed Layer* 24) [Recurrence],(Unnamed Layer* 26) [Recurrence],(Unnamed Layer* 28) [Recurrence],(Unnamed Layer* 16) [TripLimit],while/add_2,while/TensorArrayV2Read/TensorListGetItem,(Unnamed Layer* 35) [Shuffle],while/MatMul,(Unnamed Layer* 19) [Recurrence],(Unnamed Layer* 20) [Recurrence],while/MatMul_1,while/add,while/BiasAdd,while/split,while/split_1,while/split_2,while/split_3,while/Sigmoid,while/Sigmoid_1,while/Tanh,while/Sigmoid_2,while/mul,while/mul_1,while/add_1,while/Tanh_1,while/mul_2,while/Identity_4,(Unnamed Layer* 66) [LoopOutput],model/lstm/PartitionedCall/strided_slice_2,model/lstm/PartitionedCall/strided_slice_2__23} operation not supported within a loop body.)
[04/28/2021-14:06:46] [E] [TRT] ../builder/myelin/codeGenerator.cpp (112) - Myelin Error in addNodeToMyelinGraph: 0 ()
[04/28/2021-14:06:46] [E] Engine creation failed
[04/28/2021-14:06:46] [E] Engine set up failed

Finally here is the ONNX model:

model_dummy.onnx.gz

@ttyio I have the same issue: converting LSTM+Dense TF to TRT triggers a “Myelin Error in addNodeToMyelinGraph: operation not supported within a loop body”. Does the “triaged” status means that a fix is planned? Thanks.

I’ve also tried to convert my model via TF-TRT and that does work even though I’m using an LSTM layer… However TF-TRT has some significant drawbacks, as I’m aiming for a Jetson Xavier as target platform.

Does anyone know about a working example where a Keras/tensorflow model with LSTM layers is converted to TensorRT?

I’ve found out that removing the LSTM block from the network allowed me to convert to TensorRT! Question is now, what goes wrong with LSTM during TensorRT conversion? To my understanding it is supported. Right?

This issue may have to do with a warning, I’ve got while saving my model during train time in Keras:

WARNING:absl:Found untraced functions such as lstm_cell_layer_call_fn, lstm_cell_layer_call_and_return_conditional_losses, lstm_cell_layer_call_fn, lstm_cell_layer_call_and_return_conditional_losses, lstm_cell_layer_call_and_return_conditional_losses while saving (showing 5 of 5). 
These functions will not be directly callable after loading.