apex: AttributeError: module 'torch.distributed' has no attribute 'deprecated'

Hi!

I get this error on Windows 10 and torch=1.2.0 when just import apex

I think my system just not supports this, but it is not good behavior.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 42 (15 by maintainers)

Most upvoted comments

If you’ve already installed apex, remove it:

pip uninstall apex
rm -rf apex

Reinstall from @ptrblck 's fork of apex apex_no_distributed branch

git clone https://github.com/ptrblck/apex.git
cd apex
git checkout apex_no_distributed
pip install -v --no-cache-dir ./

Ah, got it. So, at that level, it’s a different issue. Now when I use your branch w/ PyTorch 1.2.0, removing all DDP, it works!

@DanyalAndriano Are you using a high-level wrapper on top of PyTorch+apex? If so, could you post a code snippet showing your use case?

@va26 @BramVanroy was kind enough to create a PR based on my branch, which was merged in https://github.com/NVIDIA/apex/pull/531. Could you reinstall apex from master and retry, please?

Sorry, it works!

@ptrblck I installed apex from your repository. It works correctly now. Thanks. Windows 10, Python 3.6, PyTorch 1.2, cuda10.0

OK, I see. Unfortunately, if your system does not support distributed training, you won’t be able to use DDP. Could you try to remove this import and all occurrences of DDP in your script? Is the apex and amp import working so far?

@ptrblck here you go!

Traceback (most recent call last):
  File "train.py", line 23, in <module>
    from apex.parallel import DistributedDataParallel
  File "C:\Users\user\Anaconda3\envs\ml-pt-apex-test\lib\site-packages\apex\parallel\__init__.py", line 8, in <module>
    ReduceOp = torch.distributed.deprecated.reduce_op
AttributeError: module 'torch.distributed' has no attribute 'deprecated'

@helson73, @tuboxin, @asbe, @metya I’ve updated my branch with some more fixes. Could you try it again please? https://github.com/ptrblck/apex/tree/apex_no_distributed