ssd.pytorch: RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

rps@rps:~/桌面/ssd.pytorch$ python3 train.py /home/rps/桌面/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. self.priors = Variable(self.priorbox.forward(), volatile=True) /home/rps/桌面/ssd.pytorch/layers/modules/l2norm.py:17: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.weight,self.gamma) Loading base network… Initializing weights… train.py:214: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. init.xavier_uniform(param) Loading the dataset… Training SSD on: VOC0712 Using the specified args: Namespace(basenet=‘vgg16_reducedfc.pth’, batch_size=32, cuda=True, dataset=‘VOC’, dataset_root=‘/home/rps/data/VOCdevkit/’, gamma=0.1, lr=0.001, momentum=0.9, num_workers=4, resume=None, save_folder=‘weights/’, start_iter=0, visdom=False, weight_decay=0.0005) train.py:169: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. targets = [Variable(ann.cuda(), volatile=True) for ann in targets] Traceback (most recent call last): File “train.py”, line 255, in <module> train() File “train.py”, line 178, in train loss_l, loss_c = criterion(out, targets) File “/home/rps/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call result = self.forward(*input, **kwargs) File “/home/rps/桌面/ssd.pytorch/layers/modules/multibox_loss.py”, line 97, in forward loss_c[pos] = 0 # filter out pos boxes for now RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

anyone helps,please…

About this issue

Original URL
State: open
Created 6 years ago
Reactions: 9
Comments: 53

Commits related to this issue

Add fix for issue https://github.com/amdegroot/ssd.pytorch/issues/173 — committed to ashiks-qb/ssd.pytorch by ashiks-qb 3 years ago
fix issue https://github.com/amdegroot/ssd.pytorch/issues/173 — committed to yodhcn/ssd.pytorch by yodhcn 2 years ago

Most upvoted comments

I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data

+68

TianSong1991 on Mar 26, 2019

Finally, I succeeded. step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line144 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double()

+64

subicWang on Nov 14, 2018

@xscjun change line: N = num_pos.data.sum() to:
N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() this should work

+42

slomrafgrav on Jun 8, 2018

by changing the order of line 97 and 98 it throws a new error for me

Traceback (most recent call last):
  File "train.py", line 254, in <module>
    train()
  File "train.py", line 182, in train
    loc_loss += loss_l.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

any suggestions?

PS: I tried as well converting the loss to double as mentioned above and still the same error!

### solved apparently ‘loss_l.data[0]’ should be replaced with ‘loss_l.item()’ instead this replacement applies on every loss_x.data[0] in the file!

+34

wisdomk on Mar 22, 2019

I have the same error，if I switch the lines 96,97 loss_c = loss_c.view(num, -1) loss_c[pos] = 0 in multibox_loss.py, this error disappear. But come with another error : “File “/home/…/ssd.pytorch/layers/modules/multibox_loss.py”, line 115, in forward loss_l /= N RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #3 ‘other’” The type of tensor is not match, how can I fix it ?

The “pos” -> torch.Size([32, 8732]) The “loss_c ” ->torch.Size([279424, 1]) when I add one line as :

        loss_c = loss_c.view(pos.size()[0], pos.size()[1]) #add line 
        loss_c[pos] = 0  # filter out pos boxes for now
        loss_c = loss_c.view(num, -1)

Then it worked.

+27

Lin-Zhipeng on Sep 27, 2018

I have the same error，if I switch the lines 96,97 loss_c = loss_c.view(num, -1) loss_c[pos] = 0 in multibox_loss.py, this error disappear. But come with another error : “File “/home/…/ssd.pytorch/layers/modules/multibox_loss.py”, line 115, in forward loss_l /= N RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #3 ‘other’” The type of tensor is not match, how can I fix it ?

+27

xscjun on Jun 7, 2018

@haibochina What? It means that the loss:loc_loss，conf_loss are out of range of your ram. So you can change the source code as following : N = num_pos.data.sum（）, loss_l / = N, loss_c / = N, loc_loss + = loss_l.item（）conf_loss + = loss_c.item（）

haibochina on Nov 4, 2019

Pytorch version:

>>> import torch
>>> print(torch.__version__)
1.1.0

Python version:

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux

multibox_loss.py:

Switch the two lines 97,98:
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0 # filter out pos boxes for now

Change line114 
N = num_pos.data.sum() -> N = num_pos.data.sum().double()

and change the following two lines to: 
loss_l = loss_l.double()
loss_c = loss_c.double()

train.py

loss_l.data[0] >> loss_l.data 
loss_c.data[0] >> loss_c.data 
loss.data[0] >> loss.data

And here is my output:

timer: 11.9583 sec.
iter 0 || Loss: 11728.9388 || timer: 0.2955 sec.
iter 10 || Loss: nan || timer: 0.2843 sec.
iter 20 || Loss: nan || timer: 0.2890 sec.
iter 30 || Loss: nan || timer: 0.2934 sec.
iter 40 || Loss: nan || timer: 0.2865 sec.
iter 50 || Loss: nan || timer: 0.2855 sec.
iter 60 || Loss: nan || timer: 0.2889 sec.
iter 70 || Loss: nan || timer: 0.2857 sec.
iter 80 || Loss: nan || timer: 0.2843 sec.
iter 90 || Loss: nan || timer: 0.2835 sec.
iter 100 || Loss: nan || timer: 0.2846 sec.
iter 110 || Loss: nan || timer: 0.2946 sec.
iter 120 || Loss: nan || timer: 0.2860 sec.
iter 130 || Loss: nan || timer: 0.2846 sec.
iter 140 || Loss: nan || timer: 0.2962 sec.
iter 150 || Loss: nan || timer: 0.2989 sec.
iter 160 || Loss: nan || timer: 0.2857 sec.

Because of the loss too big, I change line 115 to

   N = num_pos.data.sum().double()
   loss_l = loss_l.double()
   loss_c = loss_c.double()
   loss_l /= N
   loss_c /= N

solve the issue

Json0926 on Dec 25, 2019

If your Python torch version is ‘0.4.1’ ,you can change follow step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() But if your python torch version is 1.0.1,that change is no useful.

TianSong1991 on Mar 26, 2019

I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data

I’m also this answer solved my probelm. more correctly, loss_l = loss_l.double()/N loss_c = loss_c.doubel()/N 😃

hjlee9182 on Oct 23, 2020

Pytorch version:

>>> import torch
>>> print(torch.__version__)
1.1.0

Python version:

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux

multibox_loss.py:

Switch the two lines 97,98:
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0 # filter out pos boxes for now

Change line114 
N = num_pos.data.sum() -> N = num_pos.data.sum().double()

and change the following two lines to: 
loss_l = loss_l.double()
loss_c = loss_c.double()

train.py

loss_l.data[0] >> loss_l.data 
loss_c.data[0] >> loss_c.data 
loss.data[0] >> loss.data

And here is my output:

timer: 11.9583 sec.
iter 0 || Loss: 11728.9388 || timer: 0.2955 sec.
iter 10 || Loss: nan || timer: 0.2843 sec.
iter 20 || Loss: nan || timer: 0.2890 sec.
iter 30 || Loss: nan || timer: 0.2934 sec.
iter 40 || Loss: nan || timer: 0.2865 sec.
iter 50 || Loss: nan || timer: 0.2855 sec.
iter 60 || Loss: nan || timer: 0.2889 sec.
iter 70 || Loss: nan || timer: 0.2857 sec.
iter 80 || Loss: nan || timer: 0.2843 sec.
iter 90 || Loss: nan || timer: 0.2835 sec.
iter 100 || Loss: nan || timer: 0.2846 sec.
iter 110 || Loss: nan || timer: 0.2946 sec.
iter 120 || Loss: nan || timer: 0.2860 sec.
iter 130 || Loss: nan || timer: 0.2846 sec.
iter 140 || Loss: nan || timer: 0.2962 sec.
iter 150 || Loss: nan || timer: 0.2989 sec.
iter 160 || Loss: nan || timer: 0.2857 sec.

ashleylid on Aug 29, 2019

I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data

thanks,that is usefully for me,but ,step3 is:line 183,184,188,191, 5 item ,loss_x.data[0] >> loss_x.data or loss.data[0] >> loss.data

litianciucas on Mar 31, 2019

with @TianSong1991 solution except the step3 changed to following: setp 3 change the train.py! step3: change the line183,184,188,191: loss_l.data[0] >> loss_l.item() loss_c.data[0] >> loss_c.item() loss.data[0] >> loss.item() #now loss is converging… timer: 6.1581 sec. iter 0 || Loss: 32.3338 || timer: 0.3283 sec. iter 10 || Loss: 24.8091 || timer: 0.3328 sec. iter 20 || Loss: 24.4980 || timer: 0.3275 sec. iter 30 || Loss: 21.3105 || timer: 0.3167 sec. iter 40 || Loss: 14.5682 || timer: 0.3223 sec. iter 50 || Loss: 13.0729 || timer: 0.3221 sec. iter 60 || Loss: 12.3032 || timer: 0.3383 sec. iter 70 || Loss: 10.5260 || timer: 0.3246 sec. iter 80 || Loss: 11.2028 || timer: 0.3380 sec. iter 90 || Loss: 10.1715 || timer: 0.3244 sec. iter 100 || Loss: 10.1702 || timer: 0.3342 sec. iter 110 || Loss: 9.8668 || timer: 0.3384 sec. iter 120 || Loss: 9.5938 || timer: 0.3676 sec. iter 130 || Loss: 10.0942 || timer: 0.3210 sec. iter 140 || Loss: 9.7601 || timer: 0.3246 sec. iter 150 || Loss: 10.1564 || timer: 0.3202 sec. iter 160 || Loss: 9.8361 || timer: 0.3215 sec. iter 170 || Loss: 9.3565 || timer: 0.3290 sec. iter 180 || Loss: 9.2069 || timer: 0.3481 sec. iter 190 || Loss: 9.0822 || timer: 0.3374 sec. iter 200 || Loss: 9.3702 || timer: 0.3333 sec. iter 210 || Loss: 9.6193 || timer: 0.3437 sec. iter 220 || Loss: 9.1466 || timer: 0.3590 sec. iter 230 || Loss: 8.8923 || timer: 0.3211 sec. iter 240 || Loss: 9.2617 || timer: 0.3526 sec. iter 250 || Loss: 9.1713 || timer: 0.3263 sec. iter 260 || Loss: 9.4524 || timer: 0.3262 sec. iter 270 || Loss: 9.4929 || timer: 0.3581 sec. iter 280 || Loss: 8.7274 || timer: 0.3345 sec. iter 290 || Loss: 9.6723 || timer: 0.3701 sec. …

ynjiun on May 26, 2020

If you are using PyTorch 2, please follow this:

In multibox_loss.py, Swap line no. 97 and 98
In trainer.py, Line no. ~183: replace loc_loss += loss_l.data[0] with loc_loss += loss_l.item() Line no. ~184: replace conf_loss += loss_c.data[0] with conf_loss += loss_c.item() Line no. ~188 in print, replace loss.data[0] with loss.item()

This solved my problem!

sonukiller on Jul 7, 2023

Pytorch version:

>>> import torch
>>> print(torch.__version__)
1.1.0

Python version:

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux

multibox_loss.py:

Switch the two lines 97,98:
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0 # filter out pos boxes for now

Change line114 
N = num_pos.data.sum() -> N = num_pos.data.sum().double()

and change the following two lines to: 
loss_l = loss_l.double()
loss_c = loss_c.double()

train.py

loss_l.data[0] >> loss_l.data 
loss_c.data[0] >> loss_c.data 
loss.data[0] >> loss.data

And here is my output:

timer: 11.9583 sec.
iter 0 || Loss: 11728.9388 || timer: 0.2955 sec.
iter 10 || Loss: nan || timer: 0.2843 sec.
iter 20 || Loss: nan || timer: 0.2890 sec.
iter 30 || Loss: nan || timer: 0.2934 sec.
iter 40 || Loss: nan || timer: 0.2865 sec.
iter 50 || Loss: nan || timer: 0.2855 sec.
iter 60 || Loss: nan || timer: 0.2889 sec.
iter 70 || Loss: nan || timer: 0.2857 sec.
iter 80 || Loss: nan || timer: 0.2843 sec.
iter 90 || Loss: nan || timer: 0.2835 sec.
iter 100 || Loss: nan || timer: 0.2846 sec.
iter 110 || Loss: nan || timer: 0.2946 sec.
iter 120 || Loss: nan || timer: 0.2860 sec.
iter 130 || Loss: nan || timer: 0.2846 sec.
iter 140 || Loss: nan || timer: 0.2962 sec.
iter 150 || Loss: nan || timer: 0.2989 sec.
iter 160 || Loss: nan || timer: 0.2857 sec.

I’ve encountered the same one here, have you solve this problem?

I don’t change line 114, and then nan loss disappears.

mengxingkong on Oct 18, 2019

would be loss_x.data[0] >> loss_x.item() better?

blueardour on Apr 4, 2019

Finally, I succeeded. step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line144 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double()

很棒，但是有个小bug，是line 114,不是line 144

leaf918 on Mar 26, 2019