PyTorch-Encoding: AttributeError: 'NoneType' object has no attribute 'run_slave'

Hi ZHang:

segmentation

when i train segmentation model,

CUDA_VISIBLE_DEVICES=0  python train.py --dataset pcontext --model encnet --aux --se-loss

i face an error:

Using poly LR Scheduler!
Starting Epoch: 0
Total Epoches: 80
  0%|                                                                                                                                         | 0/1249 [00:00<?, ?it/s]
=>Epoches 0, learning rate = 0.0003,                 previous best = 0.0000
Traceback (most recent call last):
  File "train.py", line 175, in <module>
    trainer.training(epoch)
  File "train.py", line 105, in training
    outputs = self.model(image)
  File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 468, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 468, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/lib/python3.6/site-packages/encoding/models/encnet.py", line 32, in forward
    features = self.base_forward(x)
  File "/root/anaconda3/lib/python3.6/site-packages/encoding/models/base.py", line 51, in base_forward
    x = self.pretrained.bn1(x)
  File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 468, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/lib/python3.6/site-packages/encoding/nn/syncbn.py", line 57, in forward
    mean, inv_std = self._slave_pipe.run_slave(_ChildMessage(xsum, xsqsum, N))
AttributeError: 'NoneType' object has no attribute 'run_slave'

my environment:

  • Quick Demo has been done (success)

  • version

>>> torch.__version__
'0.5.0a0+32bc28d'
  • only one nvidia card : 1080(8G)

recognition

when i run recognition demo, i

python main.py --dataset cifar10 --model encnetdrop --widen 8 --ncodes 32 --resume model/encnet_cifar.pth.tar --eval

also face an error:

    (9): View()
    (10): Linear(in_features=512, out_features=10, bias=True)
  )
)
Traceback (most recent call last):
  File "main.py", line 181, in <module>
    main()
  File "main.py", line 56, in main
    Dataloader = dataset.Dataloader
AttributeError: module 'dataset.cifar10' has no attribute 'Dataloader'

my environment:

  • torchvision:
>>> import torchvision
>>> torchvision.__version__
'0.2.1'

Can you help me with this problem? Thank you ~

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

I see. The training requires batch size of 16 to reproduce the performance. The code hasn’t considered single gpu case. I will see if it can be fixed easily.

@zhanghang1989 I have encountered the same problem, and I only have one gpu. Have you solved this problem? I modified it to norm_layer=torch.nn.BatchNorm2d, but I still have this error. Thanks

i have solved this problem!