TensorRT-LLM: Qwen-72B-chat-GPTQ TP=4 ERROR

System Info

xx

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

xx

Expected behavior

xx

actual behavior

xx

additional notes

xx

About this issue

  • Original URL
  • State: closed
  • Created 3 months ago
  • Comments: 17

Most upvoted comments

@Hukongtao Sure, I am working on this, I will keep you posted.

@Hukongtao @HermitSun I will push a MR to fix this. If you want to fix it in advance, please try to replace this section to:

        for suf in suffixs:
            qkv_part = model_params[prefix + "attn.c_attn." + suf]
            split_dim = qkv_part.shape[1] // 3
            q_part = torch_split(qkv_part[:, :split_dim], dim=1)
            k_part = torch_split(qkv_part[:, split_dim : split_dim * 2], dim=1)
            v_part = torch_split(qkv_part[:, -split_dim:], dim=1)
            qkv_part = torch.cat([q_part, k_part, v_part], 1)
            qkv_weight_list.append(qkv_part)

@byshiue @Tracin Do you have any plan to fix this bug?

+1 qwen-72b-chat, tp=4, gptq

Is the error message you encountered the same as mine?

Yes. And I used same cmds to build the engine under trt-llm 0.9.0.dev2024040200.

I suffered this problem too. qwen-72b-chat, tp=8, smoothquant

I suffered this problem too. qwen-72b-chat, tp=4, smoothquant