apex: I try the example when init init_process_group got an error

https://github.com/NVIDIA/apex/blob/574fe2449cbe6ae4c8af53c6ecb1b5fc13877234/examples/imagenet/main_amp.py#L121

I got ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set this when I run it.

I use Pytorch 1.1.0 and Ubuntu 18.04. How to solve it, thx.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 9
  • Comments: 15 (3 by maintainers)

Most upvoted comments

Hi @PistonY,

how are you executing the script? If you are running the distributed example in main_amp.py, note that you should call:

python -m torch.distributed.launch --nproc_per_node=NUM_GPUS main_amp.py args...

as described in the README.