TensorRT-LLM: gptSessionBenchmark Failed Because of Assertion Error for tritonserver:23.11-trtllm-python-py3 Image

With 23.11 version of triton server image, the benchmark is not working.

Command: ./cpp/build/benchmarks/gptSessionBenchmark --model llama --engine_dir ./examples/llama/out/7b/fp16_1gpu --batch_size "1" --input_output_len "512, 200"

Error:

[TensorRT-LLM][ERROR] tensorrt_llm::common::TllmException: [TensorRT-LLM][ERROR] Assertion failed: d == a + length (/app/tensorrt_llm/cpp/tensorrt_llm/plugins/gptAttentionCommon/gptAttentionCommon.cpp:326)
1       0x7fff9696cfbf /opt/tritonserver/backends/tensorrtllm/libnvinfer_plugin_tensorrt_llm.so.9(+0x37fbf) [0x7fff9696cfbf]
2       0x7fff969ccc8a tensorrt_llm::plugins::GPTAttentionPluginCommon::GPTAttentionPluginCommon(void const*, unsigned long) + 762
3       0x7fff969e008d tensorrt_llm::plugins::GPTAttentionPlugin::GPTAttentionPlugin(void const*, unsigned long) + 13
4       0x7fff969e00d2 tensorrt_llm::plugins::GPTAttentionPluginCreator::deserializePlugin(char const*, void const*, unsigned long) + 50
5       0x7fff519ef8a6 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10d68a6) [0x7fff519ef8a6]
6       0x7fff519e766e /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10ce66e) [0x7fff519e766e]
7       0x7fff51982217 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x1069217) [0x7fff51982217]
8       0x7fff5198019e /usr/local/tensorrt/lib/libnvinfer.so.9(+0x106719e) [0x7fff5198019e]
9       0x7fff51997c2b /usr/local/tensorrt/lib/libnvinfer.so.9(+0x107ec2b) [0x7fff51997c2b]
10      0x7fff5199ae32 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x1081e32) [0x7fff5199ae32]
11      0x7fff5199b20c /usr/local/tensorrt/lib/libnvinfer.so.9(+0x108220c) [0x7fff5199b20c]
12      0x7fff519ce9b1 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10b59b1) [0x7fff519ce9b1]
13      0x7fff519cf777 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10b6777) [0x7fff519cf777]
14      0x7fffd2bb4e02 tensorrt_llm::runtime::TllmRuntime::TllmRuntime(void const*, unsigned long, nvinfer1::ILogger&) + 482
15      0x7fffd2b743ab tensorrt_llm::runtime::GptSession::GptSession(tensorrt_llm::runtime::GptSession::Config const&, tensorrt_llm::runtime::GptModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, void const*, unsigned long, s
td::shared_ptr<nvinfer1::ILogger>) + 651
16      0x55555556befd ./cpp/build/benchmarks/gptSessionBenchmark(+0x17efd) [0x55555556befd]
17      0x7fff964dfd90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fff964dfd90]
18      0x7fff964dfe40 __libc_start_main + 128
....

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Comments: 15

Most upvoted comments

I have figured it out: the prebuilt library in the triton server container (under ls /opt/tritonserver/backends/tensorrtllm) is not built from the latest TRT-LLM.

To resolve it, copy the libraries we manually built to that directory:

cp tensorrt_llm/build/lib/tensorrt_llm/libs/* /opt/tritonserver/backends/tensorrtllm/

After the copy I can run things well.