tensorrtllm_backend: Assertion failed: input_ids: expected 2 dims, provided 1 dims

Description

I am using the latest version of tensorrtllm_backend and successfully launched the server following the deployment instructions in docs/baichuan.md. However, when I tried to execute the example command curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2}' or the example command python3 inflight_batcher_llm/client/inflight_batcher_llm_client.py --request-output-len 200 --tokenizer-dir xxx, both resulted in errors as follows: Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138) Errors: image

{“error”:“in ensemble ‘ensemble’, Encountered error for requestId 1804289384: Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)\n1 0x7f97df4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f97df4697fd]\n2 0x7f97df5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f97df5797d8]\n3 0x7f97df4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f97df4cbeb1]\n4 0x7f97df4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f97df4ccfa6]\n5 0x7f97df4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f97df4d0f0d]\n6 0x7f97df4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f97df4bba28]\n7 0x7f97df4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f97df4bffb5]\n8 0x7f98e024f253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f98e024f253]\n9 0x7f98dffdfac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f98dffdfac3]\n10 0x7f98e0070814 clone + 68”

Launch server : image

Triton Information

Triton image: nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3 Tritonserver version: 2.41.0 Tensorrt-llm version: 0.6.1

To Reproduce

Refer to the following link to reproduce: https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/baichuan.md

Expected behavior

The example runs normally.

About this issue

  • Original URL
  • State: open
  • Created 6 months ago
  • Reactions: 13
  • Comments: 32

Most upvoted comments

Reproduced with https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md as well.

Triton image: nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3
Tritonserver version: 2.41.0
Tensorrt-llm version: 0.7.0

Request payload:

{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": ""}

Error:

Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)
1       0x7f1f9b4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f1f9b4697fd]
2       0x7f1f9b5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f1f9b5797d8]
3       0x7f1f9b4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f1f9b4cbeb1]
4       0x7f1f9b4cd319 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7b319) [0x7f1f9b4cd319]
5       0x7f1f9b4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f1f9b4d0f0d]
6       0x7f1f9b4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f1f9b4bba28]
7       0x7f1f9b4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f1f9b4bffb5]
8       0x7f1fff04f253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f1fff04f253]
9       0x7f1ffeddfac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f1ffeddfac3]
10      0x7f1ffee70814 clone + 68

I also encoutered this error, using nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3 docker image and tensorrtllm-backend v0.7.0.

Hey @ekarmazin - we acknowledge that TRT-LLM versions have not been in-sync with Triton versions. As others have observed here, it is not trivial to mix-and-match these versions. We are in the process of fixing this. We expect that 0.8.0 release (planned in early Feb time frame) will be based on Triton 24.01. After that, we are aiming to have every release of TRT-LLM keep pace with the latest Triton release. Hope this helps.

@juney-nvidia is there any draft ETA when we should expect a new image release with fixes for this issue? Custom build process like @THU-mjx described works, but with older triton version and final image is twice larger vs ngc provieded ones. But main issue that we want to use latest triton with the latest tensorrt-llm engine. Thanks.

TensorRT-LLM: 0.7.1 tensorrtllm_backend: 0.7.1 triton: nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3

I succeeded in the above environment. The steps are as:

  1. When building with TensorRT-LLM, do not add the parameters use_inflight_batching, remove_input_padding, and paged_kv_cache,below is my build command
python3 build.py --model_name chatglm3_6b \
    --model_dir /home/user/code/chatglm3-6b \
    --use_weight_only \
    --max_batch_size 16 \
    --use_gpt_attention_plugin float16 \
    --output_dir /home/user/code/trt_engines/chatglm3-6b
  1. In the tensorrtllm_backend configuration, modify the gpt_model_type parameter in the tensorrt_llm/config.pbtxt file to V1
image
  1. Run triton_server and the test results are as follows:
root@i-6iqqvhvy:/tensorrtllm_backend# curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2}'
{"cum_log_probs":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"What is machine learning? | Machine Learning\nMachine learning is a type of artificial intelligence that involves training algorithms on data in"}root@i-6iqqvhvy:/tensorrtllm_backend#
  • 2.39.0

How to get the Tritonserver 2.39.0? From the NGC?

Use the 23.10-trtllm-python-py3 tag for Triton Inference Server image

This leads to other issue:

Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 226, Serialized Engine Version: 228)

The error means that you are using a TRT version A to build the engine, while use TRT version B to load it during the run-phase? Can you help double check it?

Thanks June

Right. But with correct version, I faced the same error as this issue.

Assertion failed: input_ids: expected 2 dims, provided 1 dims

Same env as ekarmazin.

Failed to run llama

Input sequence:  [1, 19298, 297, 6641, 29899, 23027, 3444, 29892, 1105, 7598, 16370, 408, 263]
[TensorRT-LLM][ERROR] Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)
1       0x7f545f4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f545f4697fd]
2       0x7f545f5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f545f5797d8]
3       0x7f545f4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f545f4cbeb1]
4       0x7f545f4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f545f4ccfa6]
5       0x7f545f4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f545f4d0f0d]
6       0x7f545f4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f545f4bba28]
7       0x7f545f4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f545f4bffb5]
8       0x7f580a64f253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f580a64f253]
9       0x7f580a3dfac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f580a3dfac3]
10      0x7f580a471660 /lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f580a471660]
[TensorRT-LLM][ERROR] Encountered error for requestId 846930887: Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)
1       0x7f545f4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f545f4697fd]
2       0x7f545f5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f545f5797d8]
3       0x7f545f4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f545f4cbeb1]
4       0x7f545f4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f545f4ccfa6]
5       0x7f545f4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f545f4d0f0d]
6       0x7f545f4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f545f4bba28]
7       0x7f545f4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f545f4bffb5]
8       0x7f580a64f253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f580a64f253]
9       0x7f580a3dfac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f580a3dfac3]
10      0x7f580a471660 /lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f580a471660]
Got completed request
Received an error from server:
[StatusCode.INTERNAL] Encountered error for requestId 846930887: Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)
1       0x7f545f4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f545f4697fd]
2       0x7f545f5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f545f5797d8]
3       0x7f545f4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f545f4cbeb1]
4       0x7f545f4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f545f4ccfa6]
5       0x7f545f4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f545f4d0f0d]
6       0x7f545f4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f545f4bba28]
7       0x7f545f4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f545f4bffb5]
8       0x7f580a64f253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f580a64f253]
9       0x7f580a3dfac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f580a3dfac3]
10      0x7f580a471660 /lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f580a471660][TensorRT-LLM][WARNING] Step function failed, continuing.

Encountered error: [StatusCode.INTERNAL] Encountered error for requestId 846930887: Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)
1       0x7f545f4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f545f4697fd]
2       0x7f545f5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f545f5797d8]
3       0x7f545f4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f545f4cbeb1]
4       0x7f545f4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f545f4ccfa6]
5       0x7f545f4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f545f4d0f0d]
6       0x7f545f4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f545f4bba28]
7       0x7f545f4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f545f4bffb5]
8       0x7f580a64f253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f580a64f253]
9       0x7f580a3dfac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f580a3dfac3]
10      0x7f580a471660 /lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f580a471660]
Encountered error: [StatusCode.INTERNAL] Encountered error for requestId 846930887: Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)
1       0x7f545f4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f545f4697fd]
2       0x7f545f5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f545f5797d8]
3       0x7f545f4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f545f4cbeb1]
4       0x7f545f4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f545f4ccfa6]
5       0x7f545f4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f545f4d0f0d]
6       0x7f545f4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f545f4bba28]
7       0x7f545f4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f545f4bffb5]
8       0x7f580a64f253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f580a64f253]
9       0x7f580a3dfac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f580a3dfac3]
10      0x7f580a471660 /lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f580a471660]

Also, the curl said:

curl -X POST localhost:7001/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2}'
[TensorRT-LLM][ERROR] Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)
1       0x7f550b4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f550b4697fd]
2       0x7f550b5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f550b5797d8]
3       0x7f550b4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f550b4cbeb1]
4       0x7f550b4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f550b4ccfa6]
5       0x7f550b4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f550b4d0f0d]
6       0x7f550b4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f550b4bba28]
7       0x7f550b4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f550b4bffb5]
8       0x7f58b624f253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f58b624f253]
9       0x7f58b5fdfac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f58b5fdfac3]
10      0x7f58b6071660 /lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f58b6071660]
[TensorRT-LLM][ERROR] Encountered error for requestId 846930887: Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)
1       0x7f550b4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f550b4697fd]
2       0x7f550b5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f550b5797d8]
3       0x7f550b4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f550b4cbeb1]
4       0x7f550b4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f550b4ccfa6]
5       0x7f550b4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f550b4d0f0d]
6       0x7f550b4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f550b4bba28]
7       0x7f550b4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f550b4bffb5]
8       0x7f58b624f253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f58b624f253]
9       0x7f58b5fdfac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f58b5fdfac3]
10      0x7f58b6071660 /lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f58b6071660]
{"error":"in ensemble 'ensemble', Encountered error for requestId 846930887: Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)\n1       0x7f550b4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f550b4697fd]\n2       0x7f550b5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f550b5797d8]\n3       0x7f550b4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f550b4cbeb1]\n4       0x7f550b4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f550b4ccfa6]\n5       0x7f550b4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f550b4d0f0d]\n6       0x7f550b4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f550b4bba28]\n7       0x7f550b4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f550b4bffb5]\n8       0x7f58b624f253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f58b624f253]\n9       0x7f58b5fdfac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f58b5fdfac3]\n10      0x7f58b6071660 /lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f58b6071660]"}[TensorRT-LLM][WARNING] Step function failed, continuing.

The next version 0.8.0 will be on 23.12. I unfortunately don’t have a workaround until then.

  • 2.39.0

How to get the Tritonserver 2.39.0? From the NGC?

Use the 23.10-trtllm-python-py3 tag for Triton Inference Server image

This leads to other issue:

Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 226, Serialized Engine Version: 228)

v0.7.0 On Dec 24, 2023, at 02:16, Eugene Karmazin @.***> wrote: I solved this by building the docker with Option 3. I guess there is something mismatched in the pre-built docker image. @PannenetsF https://github.com/PannenetsF which branch did you use? — Reply to this email directly, view it on GitHub <#246 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK556VHJ5G4CEYVPFFSJZNLYK4NXRAVCNFSM6AAAAABA6H54TOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGM2DKMJQGI. You are receiving this because you were mentioned.

The problem is that the BASE_IMAGE used to build the image in Option 3 of branch 0.7.0 is nvcr.io/nvidia/tritonserver:23.10-py3, not the latest version (nvcr.io/nvidia/tritonserver:23.12-py3). Does this meet your expectations? 螢幕擷取畫面 2023-12-24 232646

Not sure if this reply was addressed to me, but no it doesn’t meet my expectations. We need latest nvidia drivers 12.3 as well as tensorrt_llm v0.7.0, but Option 3 can’t achieve this. Tried to use base image 23.11 and 23.12 and faced conflicts with nvidia drivers. So issue is still open, the pre-built image is not usable with the latest tensorrt-llm backend.

@schetlur-nv @juney-nvidia any updates on this issue?

Using both tensorrt-llm and triton_backend main branch and rebuilding container image could solve this problem, just keep base image as nvcr.io/nvidia/tritonserver:23.10-py3, I am succeed. The nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3 will result in this problem.

  • 2.39.0

How to get the Tritonserver 2.39.0? From the NGC?

Use the 23.10-trtllm-python-py3 tag for Triton Inference Server image

I set the base to 23.10 and compiled it successfully.