pytorch-lightning: 'RuntimeError: No rendezvous handler for env://' with multi-gpu

🐛 Bug

I get an error ‘RuntimeError: No rendezvous handler for env://’ when I run my model with multiple GPU.

Below the code and the traceback:

trainer = pl.Trainer(gpus = -1,
                     accelerator='ddp',
                     check_val_every_n_epoch=10, 
                    # precision=16,
                    # auto_scale_batch_size='binsearch',
                     callbacks=[checkpoint_callback],
                     max_epochs = 1)

GPU available: True, used: True TPU available: None, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

trainer.fit(model)

initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/2 Traceback (most recent call last):

File “<ipython-input-8-45d4afebefac>”, line 1, in <module> trainer.fit(model)

File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 470, in fit results = self.accelerator_backend.train()

File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\pytorch_lightning\accelerators\ddp_accelerator.py”, line 152, in train results = self.ddp_train(process_idx=self.task_idx, model=model)

File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\pytorch_lightning\accelerators\ddp_accelerator.py”, line 252, in ddp_train self.init_ddp_connection(

File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\pytorch_lightning\accelerators\accelerator.py”, line 153, in init_ddp_connection self.ddp_plugin.init_ddp_connection(

File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\pytorch_lightning\plugins\ddp_plugin.py”, line 90, in init_ddp_connection torch_distrib.init_process_group(

File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\torch\distributed\distributed_c10d.py”, line 433, in init_process_group rendezvous_iterator = rendezvous(

File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\torch\distributed\rendezvous.py”, line 82, in rendezvous raise RuntimeError(“No rendezvous handler for {}😕/”.format(result.scheme))

RuntimeError: No rendezvous handler for env://

The error is not present if I set

gpus = 1

Expected behavior

Environment

PyTorch Version (e.g., 1.0): 1.7.1
OS (e.g., Linux): Windows 10
How you installed PyTorch (conda, pip, source): conda
Build command you used (if compiling from source): conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
Python version: 3.8.5
CUDA/cuDNN version: 11.0
GPU models and configuration: 2 * Quadro RTX 6000
Any other relevant information:

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 16 (6 by maintainers)

Most upvoted comments

I am on windows and saw this error. change accelerator to ‘dp’ works.

mdja on Feb 3, 2021

RuntimeError: No rendezvous handler for env://

That’s not much information, but one possibility is because you are on Windows. accelerator=ddp will not work on windows, you have to choose dp.

awaelchli on Jan 31, 2021