pytorch-lightning: 'RuntimeError: No rendezvous handler for env://' with multi-gpu
🐛 Bug
I get an error ‘RuntimeError: No rendezvous handler for env://’ when I run my model with multiple GPU.
Below the code and the traceback:
trainer = pl.Trainer(gpus = -1,
accelerator='ddp',
check_val_every_n_epoch=10,
# precision=16,
# auto_scale_batch_size='binsearch',
callbacks=[checkpoint_callback],
max_epochs = 1)
GPU available: True, used: True TPU available: None, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
trainer.fit(model)
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/2 Traceback (most recent call last):
File “<ipython-input-8-45d4afebefac>”, line 1, in <module> trainer.fit(model)
File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\pytorch_lightning\trainer\trainer.py”, line 470, in fit results = self.accelerator_backend.train()
File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\pytorch_lightning\accelerators\ddp_accelerator.py”, line 152, in train results = self.ddp_train(process_idx=self.task_idx, model=model)
File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\pytorch_lightning\accelerators\ddp_accelerator.py”, line 252, in ddp_train self.init_ddp_connection(
File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\pytorch_lightning\accelerators\accelerator.py”, line 153, in init_ddp_connection self.ddp_plugin.init_ddp_connection(
File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\pytorch_lightning\plugins\ddp_plugin.py”, line 90, in init_ddp_connection torch_distrib.init_process_group(
File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\torch\distributed\distributed_c10d.py”, line 433, in init_process_group rendezvous_iterator = rendezvous(
File “C:\Users\45027900\Anaconda3\envs\PyTorch\lib\site-packages\torch\distributed\rendezvous.py”, line 82, in rendezvous raise RuntimeError(“No rendezvous handler for {}😕/”.format(result.scheme))
RuntimeError: No rendezvous handler for env://
The error is not present if I set
gpus = 1
Expected behavior
Environment
- PyTorch Version (e.g., 1.0): 1.7.1
- OS (e.g., Linux): Windows 10
- How you installed PyTorch (
conda,pip, source): conda - Build command you used (if compiling from source): conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
- Python version: 3.8.5
- CUDA/cuDNN version: 11.0
- GPU models and configuration: 2 * Quadro RTX 6000
- Any other relevant information:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (6 by maintainers)
I am on windows and saw this error. change accelerator to ‘dp’ works.
RuntimeError: No rendezvous handler for env://
That’s not much information, but one possibility is because you are on Windows. accelerator=ddp will not work on windows, you have to choose dp.