ssd.pytorch: RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0
rps@rps:~/桌面/ssd.pytorch$ python3 train.py
/home/rps/桌面/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
self.priors = Variable(self.priorbox.forward(), volatile=True)
/home/rps/桌面/ssd.pytorch/layers/modules/l2norm.py:17: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
init.constant(self.weight,self.gamma)
Loading base network…
Initializing weights…
train.py:214: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
init.xavier_uniform(param)
Loading the dataset…
Training SSD on: VOC0712
Using the specified args:
Namespace(basenet=‘vgg16_reducedfc.pth’, batch_size=32, cuda=True, dataset=‘VOC’, dataset_root=‘/home/rps/data/VOCdevkit/’, gamma=0.1, lr=0.001, momentum=0.9, num_workers=4, resume=None, save_folder=‘weights/’, start_iter=0, visdom=False, weight_decay=0.0005)
train.py:169: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():
instead.
targets = [Variable(ann.cuda(), volatile=True) for ann in targets]
Traceback (most recent call last):
File “train.py”, line 255, in <module>
train()
File “train.py”, line 178, in train
loss_l, loss_c = criterion(out, targets)
File “/home/rps/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/rps/桌面/ssd.pytorch/layers/modules/multibox_loss.py”, line 97, in forward
loss_c[pos] = 0 # filter out pos boxes for now
RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0
anyone helps,please…
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 9
- Comments: 53
Commits related to this issue
- Add fix for issue https://github.com/amdegroot/ssd.pytorch/issues/173 — committed to ashiks-qb/ssd.pytorch by ashiks-qb 3 years ago
- fix issue https://github.com/amdegroot/ssd.pytorch/issues/173 — committed to yodhcn/ssd.pytorch by yodhcn 2 years ago
I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data
Finally, I succeeded. step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line144 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double()
@xscjun change line: N = num_pos.data.sum() to:
N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() this should work
by changing the order of line 97 and 98 it throws a new error for me
any suggestions?
PS: I tried as well converting the loss to double as mentioned above and still the same error!
### solved apparently ‘loss_l.data[0]’ should be replaced with ‘loss_l.item()’ instead this replacement applies on every loss_x.data[0] in the file!
The “pos” -> torch.Size([32, 8732]) The “loss_c ” ->torch.Size([279424, 1]) when I add one line as :
Then it worked.
I have the same error,if I switch the lines 96,97 loss_c = loss_c.view(num, -1) loss_c[pos] = 0 in multibox_loss.py, this error disappear. But come with another error : “File “/home/…/ssd.pytorch/layers/modules/multibox_loss.py”, line 115, in forward loss_l /= N RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #3 ‘other’” The type of tensor is not match, how can I fix it ?
Because of the loss too big, I change line 115 to
solve the issue
If your Python torch version is ‘0.4.1’ ,you can change follow step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() But if your python torch version is 1.0.1,that change is no useful.
I solve the problem if your python torch version is 1.0.1. The solution as follow 1-3 steps: step1 and step2 change the multibox_loss.py! step1: switch the two lines 97,98: loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now step2: change the line114 N = num_pos.data.sum() to N = num_pos.data.sum().double() loss_l = loss_l.double() loss_c = loss_c.double() setp 3 change the train.py! step3: change the line188,189,193,196: loss_l.data[0] >> loss_l.data loss_c.data[0] >> loss_c.data loss.data[0] >> loss.data
I’m also this answer solved my probelm. more correctly, loss_l = loss_l.double()/N loss_c = loss_c.doubel()/N 😃
Pytorch version:
Python version:
multibox_loss.py:
train.py
And here is my output:
thanks,that is usefully for me,but ,step3 is:line 183,184,188,191, 5 item ,loss_x.data[0] >> loss_x.data or loss.data[0] >> loss.data
with @TianSong1991 solution except the step3 changed to following: setp 3 change the train.py! step3: change the line183,184,188,191: loss_l.data[0] >> loss_l.item() loss_c.data[0] >> loss_c.item() loss.data[0] >> loss.item() #now loss is converging… timer: 6.1581 sec. iter 0 || Loss: 32.3338 || timer: 0.3283 sec. iter 10 || Loss: 24.8091 || timer: 0.3328 sec. iter 20 || Loss: 24.4980 || timer: 0.3275 sec. iter 30 || Loss: 21.3105 || timer: 0.3167 sec. iter 40 || Loss: 14.5682 || timer: 0.3223 sec. iter 50 || Loss: 13.0729 || timer: 0.3221 sec. iter 60 || Loss: 12.3032 || timer: 0.3383 sec. iter 70 || Loss: 10.5260 || timer: 0.3246 sec. iter 80 || Loss: 11.2028 || timer: 0.3380 sec. iter 90 || Loss: 10.1715 || timer: 0.3244 sec. iter 100 || Loss: 10.1702 || timer: 0.3342 sec. iter 110 || Loss: 9.8668 || timer: 0.3384 sec. iter 120 || Loss: 9.5938 || timer: 0.3676 sec. iter 130 || Loss: 10.0942 || timer: 0.3210 sec. iter 140 || Loss: 9.7601 || timer: 0.3246 sec. iter 150 || Loss: 10.1564 || timer: 0.3202 sec. iter 160 || Loss: 9.8361 || timer: 0.3215 sec. iter 170 || Loss: 9.3565 || timer: 0.3290 sec. iter 180 || Loss: 9.2069 || timer: 0.3481 sec. iter 190 || Loss: 9.0822 || timer: 0.3374 sec. iter 200 || Loss: 9.3702 || timer: 0.3333 sec. iter 210 || Loss: 9.6193 || timer: 0.3437 sec. iter 220 || Loss: 9.1466 || timer: 0.3590 sec. iter 230 || Loss: 8.8923 || timer: 0.3211 sec. iter 240 || Loss: 9.2617 || timer: 0.3526 sec. iter 250 || Loss: 9.1713 || timer: 0.3263 sec. iter 260 || Loss: 9.4524 || timer: 0.3262 sec. iter 270 || Loss: 9.4929 || timer: 0.3581 sec. iter 280 || Loss: 8.7274 || timer: 0.3345 sec. iter 290 || Loss: 9.6723 || timer: 0.3701 sec. …
If you are using PyTorch 2, please follow this:
In multibox_loss.py, Swap line no. 97 and 98
In trainer.py, Line no. ~183: replace
loc_loss += loss_l.data[0]
withloc_loss += loss_l.item()
Line no. ~184: replaceconf_loss += loss_c.data[0]
withconf_loss += loss_c.item()
Line no. ~188 in print, replaceloss.data[0]
withloss.item()
This solved my problem!
I don’t change line 114, and then nan loss disappears.
would be loss_x.data[0] >> loss_x.item() better?
很棒,但是有个小bug,是line 114,不是line 144