TensorRT: chatglm model convert to tensorrt error

Description

Environment

TensorRT Version: 22.12 NVIDIA GPU: A30

Relevant Files

this is my model : https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.4/glm_block_0.onnx

Steps To Reproduce

I use onnx-simplifier tools optimizer that glm_block_0.onnx model, https://github.com/daquexian/onnx-simplifier:

onnxsim glm_block_0.onnx glm_block_0_sim.onnx

we can get glm_block_0_sim.onnx model file.

then I use trtexec tool to convert model:

trtexec --onnx=glm_block_0_sim.onnx --saveEngine=glm_block_0_sim.plan 

but I met error:

[04/04/2023-02:33:03] [I] [TRT] ----------------------------------------------------------------
[04/04/2023-02:33:04] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/04/2023-02:33:04] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[04/04/2023-02:33:04] [E] [TRT] parsers/onnx/ModelImporter.cpp:726: While parsing node number 34 [Squeeze -> "cos"]:
[04/04/2023-02:33:04] [E] [TRT] parsers/onnx/ModelImporter.cpp:727: --- Begin node ---
[04/04/2023-02:33:04] [E] [TRT] parsers/onnx/ModelImporter.cpp:728: input: "onnx::Squeeze_460"
output: "cos"
name: "Squeeze_102"
op_type: "Squeeze"
doc_string: "  File \"/home/yanxing/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/2449bdc9d85103734ae987bdc94eafa7fec2145d/modeling_chatglm.py\", line 196\ndef apply_rotary_pos_emb_index(q, k, cos, sin, position_id):\n    # position_id: [sq, b], q, k: [sq, b, np, hn], cos: [sq, 1, hn] -> [sq, b, 1, hn]\n    cos = torch.squeeze(cos)\n          ~~~~~~~~~~~~~ <--- HERE\n    sin = torch.squeeze(sin)\n    cos = F.embedding(position_id, cos).unsqueeze(2)\n"

[04/04/2023-02:33:04] [E] [TRT] parsers/onnx/ModelImporter.cpp:729: --- End node ---
[04/04/2023-02:33:04] [E] [TRT] parsers/onnx/ModelImporter.cpp:731: ERROR: parsers/onnx/builtin_op_importers.cpp:4793 In function importSqueeze:
[8] Assertion failed: !isDynamic(shape) && "Cannot infer squeeze dimensions from a dynamic shape! Please re-export your model with the Squeeze axes input set."
[04/04/2023-02:33:04] [E] Failed to parse onnx file
[04/04/2023-02:33:04] [I] Finish parsing network model
[04/04/2023-02:33:04] [E] Parsing model failed
[04/04/2023-02:33:04] [E] Failed to create engine from model or file.
[04/04/2023-02:33:04] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8501] # trtexec --onnx=glm_block_0_sim.onnx --saveEngine=glm_block_0_sim.plan

how to fix that? thank you very much.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 25

Most upvoted comments

Here is a converted chatglm6b trt model, the corresponding c++ inference and python bindings. KV-Cache is supported. You can try: https://huggingface.co/TMElyralab/lyraChatGLM @zhaohb @xika