TensorRT-LLM: Qwen-72B-chat-GPTQ TP=4 ERROR
System Info
xx
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
xx
Expected behavior
xx
actual behavior
xx
additional notes
xx
About this issue
- Original URL
- State: closed
- Created 3 months ago
- Comments: 17
@Hukongtao Sure, I am working on this, I will keep you posted.
@Hukongtao @HermitSun I will push a MR to fix this. If you want to fix it in advance, please try to replace this section to:
@byshiue @Tracin Do you have any plan to fix this bug?
Yes. And I used same cmds to build the engine under trt-llm 0.9.0.dev2024040200.
I suffered this problem too. qwen-72b-chat, tp=8, smoothquant
I suffered this problem too. qwen-72b-chat, tp=4, smoothquant