tensorrtllm_backend: Assertion failed: input_ids: expected 2 dims, provided 1 dims
Description
I am using the latest version of tensorrtllm_backend and successfully launched the server following the deployment instructions in docs/baichuan.md.
However, when I tried to execute the example command curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2}' or the example command python3 inflight_batcher_llm/client/inflight_batcher_llm_client.py --request-output-len 200 --tokenizer-dir xxx, both resulted in errors as follows:
Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)
Errors:
{“error”:“in ensemble ‘ensemble’, Encountered error for requestId 1804289384: Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: input_ids: expected 2 dims, provided 1 dims (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:138)\n1 0x7f97df4697fd /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x177fd) [0x7f97df4697fd]\n2 0x7f97df5797d8 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1277d8) [0x7f97df5797d8]\n3 0x7f97df4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f97df4cbeb1]\n4 0x7f97df4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f97df4ccfa6]\n5 0x7f97df4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f97df4d0f0d]\n6 0x7f97df4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f97df4bba28]\n7 0x7f97df4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f97df4bffb5]\n8 0x7f98e024f253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f98e024f253]\n9 0x7f98dffdfac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f98dffdfac3]\n10 0x7f98e0070814 clone + 68”
Launch server :
Triton Information
Triton image: nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3 Tritonserver version: 2.41.0 Tensorrt-llm version: 0.6.1
To Reproduce
Refer to the following link to reproduce: https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/baichuan.md
Expected behavior
The example runs normally.
About this issue
- Original URL
- State: open
- Created 6 months ago
- Reactions: 13
- Comments: 32
Reproduced with https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md as well.
Request payload:
Error:
I also encoutered this error, using
nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3docker image and tensorrtllm-backend v0.7.0.Hey @ekarmazin - we acknowledge that TRT-LLM versions have not been in-sync with Triton versions. As others have observed here, it is not trivial to mix-and-match these versions. We are in the process of fixing this. We expect that 0.8.0 release (planned in early Feb time frame) will be based on Triton 24.01. After that, we are aiming to have every release of TRT-LLM keep pace with the latest Triton release. Hope this helps.
@juney-nvidia is there any draft ETA when we should expect a new image release with fixes for this issue? Custom build process like @THU-mjx described works, but with older triton version and final image is twice larger vs ngc provieded ones. But main issue that we want to use latest triton with the latest tensorrt-llm engine. Thanks.
TensorRT-LLM: 0.7.1 tensorrtllm_backend: 0.7.1 triton: nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3
I succeeded in the above environment. The steps are as:
use_inflight_batching,remove_input_padding, andpaged_kv_cache,below is my build commandgpt_model_typeparameter in thetensorrt_llm/config.pbtxtfile toV1Right. But with correct version, I faced the same error as this issue.
Same env as ekarmazin.
Failed to run llama
Also, the curl said:
The next version 0.8.0 will be on 23.12. I unfortunately don’t have a workaround until then.
This leads to other issue:
Not sure if this reply was addressed to me, but no it doesn’t meet my expectations. We need latest nvidia drivers 12.3 as well as tensorrt_llm v0.7.0, but Option 3 can’t achieve this. Tried to use base image 23.11 and 23.12 and faced conflicts with nvidia drivers. So issue is still open, the pre-built image is not usable with the latest tensorrt-llm backend.
@schetlur-nv @juney-nvidia any updates on this issue?
Using both tensorrt-llm and triton_backend main branch and rebuilding container image could solve this problem, just keep base image as nvcr.io/nvidia/tritonserver:23.10-py3, I am succeed. The nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3 will result in this problem.
Use the
23.10-trtllm-python-py3tag for Triton Inference Server imageI set the base to 23.10 and compiled it successfully.