TensorRT-LLM: Qwen-72B-chat-GPTQ TP=4 ERROR

System Info

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Expected behavior

actual behavior

additional notes

About this issue

Original URL
State: closed
Created 3 months ago
Comments: 17

Most upvoted comments

@Hukongtao Sure, I am working on this, I will keep you posted.

Tracin on Apr 17, 2024

@Hukongtao @HermitSun I will push a MR to fix this. If you want to fix it in advance, please try to replace this section to:

        for suf in suffixs:
            qkv_part = model_params[prefix + "attn.c_attn." + suf]
            split_dim = qkv_part.shape[1] // 3
            q_part = torch_split(qkv_part[:, :split_dim], dim=1)
            k_part = torch_split(qkv_part[:, split_dim : split_dim * 2], dim=1)
            v_part = torch_split(qkv_part[:, -split_dim:], dim=1)
            qkv_part = torch.cat([q_part, k_part, v_part], 1)
            qkv_weight_list.append(qkv_part)

Tracin on Apr 17, 2024

@byshiue @Tracin Do you have any plan to fix this bug?

Hukongtao on Apr 10, 2024

+1 qwen-72b-chat, tp=4, gptq

Is the error message you encountered the same as mine?

Yes. And I used same cmds to build the engine under trt-llm 0.9.0.dev2024040200.

HermitSun on Apr 10, 2024

I suffered this problem too. qwen-72b-chat, tp=8, smoothquant

ZhangJinxin1 on Apr 3, 2024

I suffered this problem too. qwen-72b-chat, tp=4, smoothquant

adamydwang on Mar 27, 2024