TensorRT: [bug] Issue to build an engine with TRT 8.6.1 with unsupported 'Sign' node

Description

I cannot build an engine with TensorRT Python 8.6.1 because of an unsupported Sign node. Nevertheless, according to the supported ONNX operators, the op Sign is available in onnx-tensorrt since v 8.2. I have seen all the supported ops here : https://github.com/onnx/onnx-tensorrt/blob/8.6-GA/docs/operators.md

Also this op exists in ONNX since the opset 13 : https://onnx.ai/onnx/operators/onnx__Sign.html

Hence I’m asking if this is an actual bug or I did something wrong.

Environment

TensorRT Version: 8.6.1

NVIDIA GPU: RTX 3080 mobile

NVIDIA Driver Version: 536.40

CUDA Version: 11.8

CUDNN Version: not installed

Operating System: Linux Ubuntu 22.04.2 LTS

Python Version (if applicable): 3.10.11

Tensorflow Version (if applicable): not installed

PyTorch Version (if applicable): 2.0.1

Baremetal or Container (if so, version): none

Steps To Reproduce

I have run the following pip install commands:

pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118
pip install tensorrt
pip install transformers

The installed version I have are:

  • torch==2.0.1
  • transformers==4.30.2
  • tensorrt==8.6.1

Here the Python script to run:

from transformers import DebertaV2ForSequenceClassification
import tensorrt as trt
import torch


batch_size = 1
seq_len = 12
deberta_model = DebertaV2ForSequenceClassification.from_pretrained("microsoft/mdeberta-v3-base")
vocab_size = deberta_model.config.vocab_size

deberta_model.eval()

input_ids = torch.randint(0, vocab_size, (batch_size, seq_len), dtype=torch.long)
attention_mask = torch.randint(0, 2, (batch_size, seq_len), dtype=torch.long)
input_names = ['input_ids', 'attention_mask']
output_names = ['output']
dynamic_axes={'input_ids'   : {0 : 'batch_size'},
              'attention_mask'   : {0 : 'batch_size'},
              'output' : {0 : 'batch_size'}}

torch.onnx.export(deberta_model,
                  (input_ids, attention_mask),
                  "model.onnx",
                  export_params=True,
                  opset_version=13,
                  do_constant_folding=True,
                  input_names = input_names,
                  output_names = output_names,
                  dynamic_axes = dynamic_axes
                 )

TRT_LOGGER = trt.Logger(trt.Logger.INFO)
TRT_BUILDER = trt.Builder(TRT_LOGGER)
network = TRT_BUILDER.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
onnx_parser = trt.OnnxParser(network, TRT_LOGGER)
parse_success = onnx_parser.parse_from_file("model.onnx")

for idx in range(onnx_parser.num_errors):
    print(onnx_parser.get_error(idx))

The output logs are:

Some weights of the model checkpoint at microsoft/mdeberta-v3-base were not used when initializing DebertaV2ForSequenceClassification: ['lm_predictions.lm_head.dense.weight', 'mask_predictions.LayerNorm.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'mask_predictions.LayerNorm.weight', 'mask_predictions.classifier.bias', 'deberta.embeddings.word_embeddings._weight', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'mask_predictions.dense.bias', 'mask_predictions.classifier.weight', 'mask_predictions.dense.weight']
- This IS expected if you are initializing DebertaV2ForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2ForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/mdeberta-v3-base and are newly initialized: ['classifier.bias', 'pooler.dense.weight', 'classifier.weight', 'pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/jplu/miniconda3/envs/work/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:560: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.tensor(mid - 1).type_as(relative_pos),
/home/jplu/miniconda3/envs/work/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:564: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  torch.ceil(torch.log(abs_pos / mid) / torch.log(torch.tensor((max_position - 1) / mid)) * (mid - 1)) + mid
/home/jplu/miniconda3/envs/work/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:723: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  scale = torch.sqrt(torch.tensor(query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/jplu/miniconda3/envs/work/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:723: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scale = torch.sqrt(torch.tensor(query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/jplu/miniconda3/envs/work/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:802: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  scale = torch.sqrt(torch.tensor(pos_key_layer.size(-1), dtype=torch.float) * scale_factor)
/home/jplu/miniconda3/envs/work/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:802: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scale = torch.sqrt(torch.tensor(pos_key_layer.size(-1), dtype=torch.float) * scale_factor)
/home/jplu/miniconda3/envs/work/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:814: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  scale = torch.sqrt(torch.tensor(pos_query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/jplu/miniconda3/envs/work/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:814: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scale = torch.sqrt(torch.tensor(pos_query_layer.size(-1), dtype=torch.float) * scale_factor)
/home/jplu/miniconda3/envs/work/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:815: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if key_layer.size(-2) != query_layer.size(-2):
/home/jplu/miniconda3/envs/work/lib/python3.10/site-packages/transformers/models/deberta_v2/modeling_deberta_v2.py:112: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  output = input.masked_fill(rmask, torch.tensor(torch.finfo(input.dtype).min))
============= Diagnostic Run torch.onnx.export version 2.0.1+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

[07/12/2023-10:13:28] [TRT] [I] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 1267, GPU 1116 (MiB)
[07/12/2023-10:13:36] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1444, GPU +268, now: CPU 2788, GPU 1384 (MiB)
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1116186214
[07/12/2023-10:13:36] [TRT] [I] ----------------------------------------------------------------
[07/12/2023-10:13:36] [TRT] [I] Input filename:   model.onnx
[07/12/2023-10:13:36] [TRT] [I] ONNX IR version:  0.0.7
[07/12/2023-10:13:36] [TRT] [I] Opset version:    13
[07/12/2023-10:13:36] [TRT] [I] Producer name:    pytorch
[07/12/2023-10:13:36] [TRT] [I] Producer version: 2.0.1
[07/12/2023-10:13:36] [TRT] [I] Domain:
[07/12/2023-10:13:36] [TRT] [I] Model version:    0
[07/12/2023-10:13:36] [TRT] [I] Doc string:
[07/12/2023-10:13:36] [TRT] [I] ----------------------------------------------------------------
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1116186214
[07/12/2023-10:13:37] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/12/2023-10:13:37] [TRT] [E] ModelImporter.cpp:771: While parsing node number 66 [Sign -> "/deberta/encoder/Sign_output_0"]:
[07/12/2023-10:13:37] [TRT] [E] ModelImporter.cpp:772: --- Begin node ---
[07/12/2023-10:13:37] [TRT] [E] ModelImporter.cpp:773: input: "/deberta/encoder/Sub_output_0"
output: "/deberta/encoder/Sign_output_0"
name: "/deberta/encoder/Sign"
op_type: "Sign"

[07/12/2023-10:13:37] [TRT] [E] ModelImporter.cpp:774: --- End node ---
[07/12/2023-10:13:37] [TRT] [E] ModelImporter.cpp:777: ERROR: onnx2trt_utils.cpp:1779 In function unaryHelper:
[8] Assertion failed: validUnaryType && "This version of TensorRT does not support the given operator with the given input data type."
In node 66 (unaryHelper): UNSUPPORTED_NODE: Assertion failed: validUnaryType && "This version of TensorRT does not support the given operator with the given input data type."

Thanks in advance for any help you can provide.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18

Most upvoted comments

You can use the fold_constants Python API. Something like:

from polygraphy.backend.onnx import fold_constants
import onnx

model = fold_constants(onnx.load("model_updated.onnx"))
onnx.save(model, "model_updated_folded.onnx")

Also any ETA for this new release of TRT that has the fix?

No for now 😃

Any idea how I can translate this command for constant folding into Python? It would be easier for me to integrate into our deployment pipeline.

Sorry I don’t know it, @pranavm-nvidia may know the answer and polygraph is also open-source.