vllm: NCCL error

I’m trying to load model into LLM(model=“meta-llama/Llama-2-7b-chat-hf”) and I’m getting the error below

DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:219, invalid argument, NCCL version 2.14.3
ncclInvalidArgument: Invalid value for an argument.
Last error:
Invalid config blocking attribute value -2147483648

About this issue

  • Original URL
  • State: open
  • Created 7 months ago
  • Reactions: 3
  • Comments: 18

Most upvoted comments

pip list | grep nccl to check if you have two versions, you should remove the unnecessary one

same issue in cuda 12.1, torch 2.1.1 + cu121, did u solve it ?

same issue in cuda 11.8 , torch2.1.0+cu118

pip install --upgrade torch

solved the issue for me.

torch 2.1.2 torchaudio 2.1.2 torchvision 0.16.2