ssd.pytorch: RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

rps@rps:~/桌面/ssd.pytorch$ python3 train.py /home/rps/桌面/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. self.priors = Variable(self.priorbox.forward(), volatile=True) /home/rps/桌面/ssd.pytorch/layers/modules/l2norm.py:17: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.weight,self.gamma) Loading base network… Initializing weights… train.py:214: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. init.xavier_uniform(param) Loading the dataset… Training SSD on: VOC0712 Using the specified args: Namespace(basenet=‘vgg16_reducedfc.pth’, batch_size=32, cuda=True, dataset=‘VOC’, dataset_root=‘/home/rps/data/VOCdevkit/’, gamma=0.1, lr=0.001, momentum=0.9, num_workers=4, resume=None, save_folder=‘weights/’, start_iter=0, visdom=False, weight_decay=0.0005) train.py:169: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. targets = [Variable(ann.cuda(), volatile=True) for ann in targets] Traceback (most recent call last): File “train.py”, line 255, in <module> train() File “train.py”, line 178, in train loss_l, loss_c = criterion(out, targets) File “/home/rps/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call result = self.forward(*input, **kwargs) File “/home/rps/桌面/ssd.pytorch/layers/modules/multibox_loss.py”, line 97, in forward loss_c[pos] = 0 # filter out pos boxes for now RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

anyone helps,please…

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 9
  • Comments: 53

Commits related to this issue

Most upvoted comments

I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data

Finally, I succeeded. step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line144 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double()

@xscjun change line: N = num_pos.data.sum() to:
N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() this should work

by changing the order of line 97 and 98 it throws a new error for me

Traceback (most recent call last):
  File "train.py", line 254, in <module>
    train()
  File "train.py", line 182, in train
    loc_loss += loss_l.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

any suggestions?

PS: I tried as well converting the loss to double as mentioned above and still the same error!


### solved apparently ‘loss_l.data[0]’ should be replaced with ‘loss_l.item()’ instead this replacement applies on every loss_x.data[0] in the file!

I have the same error,if I switch the lines 96,97 loss_c = loss_c.view(num, -1) loss_c[pos] = 0 in multibox_loss.py, this error disappear. But come with another error : “File “/home/…/ssd.pytorch/layers/modules/multibox_loss.py”, line 115, in forward loss_l /= N RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #3 ‘other’” The type of tensor is not match, how can I fix it ?

The “pos” -> torch.Size([32, 8732]) The “loss_c ” ->torch.Size([279424, 1]) when I add one line as :

        loss_c = loss_c.view(pos.size()[0], pos.size()[1]) #add line 
        loss_c[pos] = 0  # filter out pos boxes for now
        loss_c = loss_c.view(num, -1)

Then it worked.

I have the same error,if I switch the lines 96,97 loss_c = loss_c.view(num, -1) loss_c[pos] = 0 in multibox_loss.py, this error disappear. But come with another error : “File “/home/…/ssd.pytorch/layers/modules/multibox_loss.py”, line 115, in forward loss_l /= N RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #3 ‘other’” The type of tensor is not match, how can I fix it ?

@haibochina What? It means that the loss:loc_loss,conf_loss are out of range of your ram. So you can change the source code as following : N = num_pos.data.sum(), loss_l / = N, loss_c / = N, loc_loss + = loss_l.item()conf_loss + = loss_c.item()

Pytorch version:

>>> import torch
>>> print(torch.__version__)
1.1.0

Python version:

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux

multibox_loss.py:

Switch the two lines 97,98:
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0 # filter out pos boxes for now
Change line114 
N = num_pos.data.sum() -> N = num_pos.data.sum().double()
and change the following two lines to: 
loss_l = loss_l.double()
loss_c = loss_c.double()

train.py

loss_l.data[0] >> loss_l.data 
loss_c.data[0] >> loss_c.data 
loss.data[0] >> loss.data

And here is my output:

timer: 11.9583 sec.
iter 0 || Loss: 11728.9388 || timer: 0.2955 sec.
iter 10 || Loss: nan || timer: 0.2843 sec.
iter 20 || Loss: nan || timer: 0.2890 sec.
iter 30 || Loss: nan || timer: 0.2934 sec.
iter 40 || Loss: nan || timer: 0.2865 sec.
iter 50 || Loss: nan || timer: 0.2855 sec.
iter 60 || Loss: nan || timer: 0.2889 sec.
iter 70 || Loss: nan || timer: 0.2857 sec.
iter 80 || Loss: nan || timer: 0.2843 sec.
iter 90 || Loss: nan || timer: 0.2835 sec.
iter 100 || Loss: nan || timer: 0.2846 sec.
iter 110 || Loss: nan || timer: 0.2946 sec.
iter 120 || Loss: nan || timer: 0.2860 sec.
iter 130 || Loss: nan || timer: 0.2846 sec.
iter 140 || Loss: nan || timer: 0.2962 sec.
iter 150 || Loss: nan || timer: 0.2989 sec.
iter 160 || Loss: nan || timer: 0.2857 sec.

Because of the loss too big, I change line 115 to

   N = num_pos.data.sum().double()
   loss_l = loss_l.double()
   loss_c = loss_c.double()
   loss_l /= N
   loss_c /= N

solve the issue

If your Python torch version is ‘0.4.1’ ,you can change follow step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() But if your python torch version is 1.0.1,that change is no useful.

I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data

I’m also this answer solved my probelm. more correctly, loss_l = loss_l.double()/N loss_c = loss_c.doubel()/N 😃

Pytorch version:

>>> import torch
>>> print(torch.__version__)
1.1.0

Python version:

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux

multibox_loss.py:

Switch the two lines 97,98:
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0 # filter out pos boxes for now
Change line114 
N = num_pos.data.sum() -> N = num_pos.data.sum().double()
and change the following two lines to: 
loss_l = loss_l.double()
loss_c = loss_c.double()

train.py

loss_l.data[0] >> loss_l.data 
loss_c.data[0] >> loss_c.data 
loss.data[0] >> loss.data

And here is my output:

timer: 11.9583 sec.
iter 0 || Loss: 11728.9388 || timer: 0.2955 sec.
iter 10 || Loss: nan || timer: 0.2843 sec.
iter 20 || Loss: nan || timer: 0.2890 sec.
iter 30 || Loss: nan || timer: 0.2934 sec.
iter 40 || Loss: nan || timer: 0.2865 sec.
iter 50 || Loss: nan || timer: 0.2855 sec.
iter 60 || Loss: nan || timer: 0.2889 sec.
iter 70 || Loss: nan || timer: 0.2857 sec.
iter 80 || Loss: nan || timer: 0.2843 sec.
iter 90 || Loss: nan || timer: 0.2835 sec.
iter 100 || Loss: nan || timer: 0.2846 sec.
iter 110 || Loss: nan || timer: 0.2946 sec.
iter 120 || Loss: nan || timer: 0.2860 sec.
iter 130 || Loss: nan || timer: 0.2846 sec.
iter 140 || Loss: nan || timer: 0.2962 sec.
iter 150 || Loss: nan || timer: 0.2989 sec.
iter 160 || Loss: nan || timer: 0.2857 sec.

I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data

thanks,that is usefully for me,but ,step3 is:line 183,184,188,191, 5 item ,loss_x.data[0] >> loss_x.data or loss.data[0] >> loss.data

with @TianSong1991 solution except the step3 changed to following: setp 3 change the train.py! step3: change the line183,184,188,191: loss_l.data[0] >> loss_l.item() loss_c.data[0] >> loss_c.item() loss.data[0] >> loss.item() #now loss is converging… timer: 6.1581 sec. iter 0 || Loss: 32.3338 || timer: 0.3283 sec. iter 10 || Loss: 24.8091 || timer: 0.3328 sec. iter 20 || Loss: 24.4980 || timer: 0.3275 sec. iter 30 || Loss: 21.3105 || timer: 0.3167 sec. iter 40 || Loss: 14.5682 || timer: 0.3223 sec. iter 50 || Loss: 13.0729 || timer: 0.3221 sec. iter 60 || Loss: 12.3032 || timer: 0.3383 sec. iter 70 || Loss: 10.5260 || timer: 0.3246 sec. iter 80 || Loss: 11.2028 || timer: 0.3380 sec. iter 90 || Loss: 10.1715 || timer: 0.3244 sec. iter 100 || Loss: 10.1702 || timer: 0.3342 sec. iter 110 || Loss: 9.8668 || timer: 0.3384 sec. iter 120 || Loss: 9.5938 || timer: 0.3676 sec. iter 130 || Loss: 10.0942 || timer: 0.3210 sec. iter 140 || Loss: 9.7601 || timer: 0.3246 sec. iter 150 || Loss: 10.1564 || timer: 0.3202 sec. iter 160 || Loss: 9.8361 || timer: 0.3215 sec. iter 170 || Loss: 9.3565 || timer: 0.3290 sec. iter 180 || Loss: 9.2069 || timer: 0.3481 sec. iter 190 || Loss: 9.0822 || timer: 0.3374 sec. iter 200 || Loss: 9.3702 || timer: 0.3333 sec. iter 210 || Loss: 9.6193 || timer: 0.3437 sec. iter 220 || Loss: 9.1466 || timer: 0.3590 sec. iter 230 || Loss: 8.8923 || timer: 0.3211 sec. iter 240 || Loss: 9.2617 || timer: 0.3526 sec. iter 250 || Loss: 9.1713 || timer: 0.3263 sec. iter 260 || Loss: 9.4524 || timer: 0.3262 sec. iter 270 || Loss: 9.4929 || timer: 0.3581 sec. iter 280 || Loss: 8.7274 || timer: 0.3345 sec. iter 290 || Loss: 9.6723 || timer: 0.3701 sec. …

If you are using PyTorch 2, please follow this:

  1. In multibox_loss.py, Swap line no. 97 and 98

  2. In trainer.py, Line no. ~183: replace loc_loss += loss_l.data[0] with loc_loss += loss_l.item() Line no. ~184: replace conf_loss += loss_c.data[0] with conf_loss += loss_c.item() Line no. ~188 in print, replace loss.data[0] with loss.item()

This solved my problem!

Pytorch version:

>>> import torch
>>> print(torch.__version__)
1.1.0

Python version:

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux

multibox_loss.py:

Switch the two lines 97,98:
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0 # filter out pos boxes for now
Change line114 
N = num_pos.data.sum() -> N = num_pos.data.sum().double()
and change the following two lines to: 
loss_l = loss_l.double()
loss_c = loss_c.double()

train.py

loss_l.data[0] >> loss_l.data 
loss_c.data[0] >> loss_c.data 
loss.data[0] >> loss.data

And here is my output:

timer: 11.9583 sec.
iter 0 || Loss: 11728.9388 || timer: 0.2955 sec.
iter 10 || Loss: nan || timer: 0.2843 sec.
iter 20 || Loss: nan || timer: 0.2890 sec.
iter 30 || Loss: nan || timer: 0.2934 sec.
iter 40 || Loss: nan || timer: 0.2865 sec.
iter 50 || Loss: nan || timer: 0.2855 sec.
iter 60 || Loss: nan || timer: 0.2889 sec.
iter 70 || Loss: nan || timer: 0.2857 sec.
iter 80 || Loss: nan || timer: 0.2843 sec.
iter 90 || Loss: nan || timer: 0.2835 sec.
iter 100 || Loss: nan || timer: 0.2846 sec.
iter 110 || Loss: nan || timer: 0.2946 sec.
iter 120 || Loss: nan || timer: 0.2860 sec.
iter 130 || Loss: nan || timer: 0.2846 sec.
iter 140 || Loss: nan || timer: 0.2962 sec.
iter 150 || Loss: nan || timer: 0.2989 sec.
iter 160 || Loss: nan || timer: 0.2857 sec.

I’ve encountered the same one here, have you solve this problem?

I don’t change line 114, and then nan loss disappears.

would be loss_x.data[0] >> loss_x.item() better?

Finally, I succeeded. step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line144 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double()

很棒,但是有个小bug,是line 114,不是line 144