vllm: Cuda failure 'peer access is not supported between these two devices'
Usage stats collection is enabled. To disable this, run the following command: ray disable-usage-stats
before starting Ray. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.
2023-07-08 23:11:34,236 INFO worker.py:1610 – Started a local Ray instance. View the dashboard at 127.0.0.1:8265
INFO 07-08 23:11:35 llm_engine.py:60] Initializing an LLM engine with config: model=‘openlm-research/open_llama_13b’, tokenizer=‘openlm-research/open_llama_13b’, tokenizer_mode=auto, dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=4, seed=0)
INFO 07-08 23:11:35 tokenizer.py:28] For some LLaMA-based models, initializing the fast tokenizer may take a long time. To eliminate the initialization time, consider using ‘hf-internal-testing/llama-tokenizer’ instead of the original tokenizer.
(Worker pid=4225) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::Worker.init() (pid=4225, ip=172.31.68.176, actor_id=5dc662848f950df8d330eb8a01000000, repr=<vllm.worker.worker.Worker object at 0x7f4e9ea814e0>)
(Worker pid=4225) File “/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py”, line 40, in init
(Worker pid=4225) _init_distributed_environment(parallel_config, rank,
(Worker pid=4225) File “/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py”, line 307, in _init_distributed_environment
(Worker pid=4225) torch.distributed.all_reduce(torch.zeros(1).cuda())
(Worker pid=4225) File “/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py”, line 1451, in wrapper
(Worker pid=4225) return func(*args, **kwargs)
(Worker pid=4225) File “/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py”, line 1700, in all_reduce
(Worker pid=4225) work = default_pg.allreduce([tensor], opts)
(Worker pid=4225) torch.distributed.DistBackendError: NCCL error in: …/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1275, internal error, NCCL version 2.14.3
(Worker pid=4225) ncclInternalError: Internal check failed.
(Worker pid=4225) Last error:
(Worker pid=4225) Cuda failure ‘peer access is not supported between these two devices’
Code: llm = LLM(model=“openlm-research/open_llama_13b”, tensor_parallel_size=4)
Env: Single EC2 instance G5.12xlarge with 4 A10G GPU
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (3 by maintainers)
@nivibilla I tried the above workaround in a notebook g5.12xlarge instance in SageMaker and It worked for me. I also tried reinstalling vllm from source adding
os.environ["NCCL_IGNORE_DISABLED_P2P"] = '1'
in the codebase just before this line and it worked again. I guess you tried on a EC2 VM. Can you try the second way ?steps
os
also)pip install .