TensorRT-LLM: gptSessionBenchmark Failed Because of Assertion Error for tritonserver:23.11-trtllm-python-py3 Image
With 23.11 version of triton server image, the benchmark is not working.
Command:
./cpp/build/benchmarks/gptSessionBenchmark --model llama --engine_dir ./examples/llama/out/7b/fp16_1gpu --batch_size "1" --input_output_len "512, 200"
Error:
[TensorRT-LLM][ERROR] tensorrt_llm::common::TllmException: [TensorRT-LLM][ERROR] Assertion failed: d == a + length (/app/tensorrt_llm/cpp/tensorrt_llm/plugins/gptAttentionCommon/gptAttentionCommon.cpp:326)
1 0x7fff9696cfbf /opt/tritonserver/backends/tensorrtllm/libnvinfer_plugin_tensorrt_llm.so.9(+0x37fbf) [0x7fff9696cfbf]
2 0x7fff969ccc8a tensorrt_llm::plugins::GPTAttentionPluginCommon::GPTAttentionPluginCommon(void const*, unsigned long) + 762
3 0x7fff969e008d tensorrt_llm::plugins::GPTAttentionPlugin::GPTAttentionPlugin(void const*, unsigned long) + 13
4 0x7fff969e00d2 tensorrt_llm::plugins::GPTAttentionPluginCreator::deserializePlugin(char const*, void const*, unsigned long) + 50
5 0x7fff519ef8a6 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10d68a6) [0x7fff519ef8a6]
6 0x7fff519e766e /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10ce66e) [0x7fff519e766e]
7 0x7fff51982217 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x1069217) [0x7fff51982217]
8 0x7fff5198019e /usr/local/tensorrt/lib/libnvinfer.so.9(+0x106719e) [0x7fff5198019e]
9 0x7fff51997c2b /usr/local/tensorrt/lib/libnvinfer.so.9(+0x107ec2b) [0x7fff51997c2b]
10 0x7fff5199ae32 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x1081e32) [0x7fff5199ae32]
11 0x7fff5199b20c /usr/local/tensorrt/lib/libnvinfer.so.9(+0x108220c) [0x7fff5199b20c]
12 0x7fff519ce9b1 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10b59b1) [0x7fff519ce9b1]
13 0x7fff519cf777 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10b6777) [0x7fff519cf777]
14 0x7fffd2bb4e02 tensorrt_llm::runtime::TllmRuntime::TllmRuntime(void const*, unsigned long, nvinfer1::ILogger&) + 482
15 0x7fffd2b743ab tensorrt_llm::runtime::GptSession::GptSession(tensorrt_llm::runtime::GptSession::Config const&, tensorrt_llm::runtime::GptModelConfig const&, tensorrt_llm::runtime::WorldConfig const&, void const*, unsigned long, s
td::shared_ptr<nvinfer1::ILogger>) + 651
16 0x55555556befd ./cpp/build/benchmarks/gptSessionBenchmark(+0x17efd) [0x55555556befd]
17 0x7fff964dfd90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fff964dfd90]
18 0x7fff964dfe40 __libc_start_main + 128
....
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Comments: 15
I have figured it out: the prebuilt library in the triton server container (under
ls /opt/tritonserver/backends/tensorrtllm
) is not built from the latest TRT-LLM.To resolve it, copy the libraries we manually built to that directory:
After the copy I can run things well.