vllm: NCCL error
I’m trying to load model into LLM(model=“meta-llama/Llama-2-7b-chat-hf”) and I’m getting the error below
DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:219, invalid argument, NCCL version 2.14.3
ncclInvalidArgument: Invalid value for an argument.
Last error:
Invalid config blocking attribute value -2147483648
About this issue
- Original URL
- State: open
- Created 7 months ago
- Reactions: 3
- Comments: 18
pip list | grep nccl to check if you have two versions, you should remove the unnecessary one
same issue in cuda 12.1, torch 2.1.1 + cu121, did u solve it ?
same issue in cuda 11.8 , torch2.1.0+cu118
solved the issue for me.
torch 2.1.2 torchaudio 2.1.2 torchvision 0.16.2