TensorRT-LLM: benchmark for chatglm2-6b failed

I convert chagglm2-6b model and run fine with the build command:

python3 build.py --model_dir=${model_dir} \
                 --dtype float16 \
                 --use_gpt_attention_plugin float16 \
                 --use_gemm_plugin float16

but benchmark failed with the following command:

../../cpp/build/benchmarks/gptSessionBenchmark --duration 30 --model chatglm2-6b --engine_dir /code/tensorrt_llm/examples/chatglm2-6b/trtModel --batch_size 1 --input_output_len 32,1

error message:

[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: position_ids: expected 2 dims, provided 3 dims (/code/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:139)
1       0x561e1c97c6ee tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7f6be25bd53b tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1867
3       0x7f6be2587453 tensorrt_llm::runtime::GptSession::generateSingleBatch(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 2211
4       0x561e1c980537 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x17537) [0x561e1c980537]
5       0x7f6ba4edcd90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f6ba4edcd90]
6       0x7f6ba4edce40 __libc_start_main + 128
7       0x561e1c981fe5 ../../cpp/build/benchmarks/gptSessionBenchmark(+0x18fe5) [0x561e1c981fe5]

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 16

Most upvoted comments

Has this bug been fixed on main yet?

Hi @elinx,

As explained in https://github.com/NVIDIA/TensorRT-LLM/discussions/55, we plan to have two branches: the stable and the dev branches. We will update the dev branch soon with a bunch of fixes. The goal is to have a push to the dev branch this Friday (Oct. 27th). To be transparent, we might have to slip the schedule and do it only on Monday (Oct. 30th) but, in both cases, itโ€™s coming soon ๐Ÿ˜ƒ. The fix will be included in that update of the dev branch.

Thanks, Julien