vision: distance_box_iou() and complete_box_iou() don't work if both sets don't have the same number of boxes
š Describe the bug
*box_iou()
functions should return a matrix of results for every possible pair (box1, box2), where box1 is a box from boxes1
and box2 is a box from boxes2
. box_iou()
and generalized_box_iou()
work this way, i.e. if boxes1
is an Nx4 matrix and boxes2
is an Mx4 matrix, the result is an NxM matrix. The recently added distance_box_iou()
and complete_box_iou()
donāt work if there arenāt as many boxes in boxes1
and boxes2
.
import torch
from torchvision.ops import box_iou, generalized_box_iou, distance_box_iou, complete_box_iou
N = 5
M = 6
boxes1 = torch.rand((N, 4))
boxes2 = torch.rand((M, 4))
print(box_iou(boxes1, boxes2).shape) # torch.Size([5, 6])
print(generalized_box_iou(boxes1, boxes2).shape) # torch.Size([5, 6])
print(distance_box_iou(boxes1, boxes2).shape) # RuntimeError
print(complete_box_iou(boxes1, boxes2).shape) # RuntimeError
When running the above code, distance_box_iou()
and complete_box_iou()
will fail with a RuntimeError
. The output is below:
torch.Size([5, 6])
torch.Size([5, 6])
Traceback (most recent call last):
File ".../test.py", line 10, in <module>
print(distance_box_iou(boxes1, boxes2).shape) # RuntimeError
File ".../lib/python3.9/site-packages/torchvision/ops/boxes.py", line 361, in distance_box_iou
diou, _ = _box_diou_iou(boxes1, boxes2)
File ".../lib/python3.9/site-packages/torchvision/ops/boxes.py", line 378, in _box_diou_iou
centers_distance_squared = (_upcast(x_p - x_g) ** 2) + (_upcast(y_p - y_g) ** 2)
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 0
This is not caught by the unit tests, because thereās no such case where thereās a different number of boxes in the two sets.
The problem is in _box_diou_iou()
. It looks like iou
and diagonal_distance_squared
are calculated for every possible pair (by adding an empty dimension), but centers_distance_squared
is not.
As a side note, I personally feel itās confusing that these functions produce the output for every possible pair. By convention, PyTorch functions produce element-wise results. For example, torch.add(boxes1, boxes2)
only works if boxes1
and boxes2
contain the same number of boxes. If you want a pair-wise addition, you can easily call torch.add(boxes1[:, None, :], boxes2)
. The fact that *box_iou()
functions produce pair-wise results makes the implementation complicated. And the only way to get element-wise results is calling box_iou(boxes1, boxes2).diagonal()
, which is inefficient.
Versions
PyTorch version: 1.12.0+cu113 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31
Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1650 Ti with Max-Q Design Nvidia driver version: 516.59 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] mypy==0.950 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.3 [pip3] pytorch-lightning==1.6.5 [pip3] pytorch-lightning-bolts==0.2.5 [pip3] pytorch-quantization==2.1.2 [pip3] torch==1.12.0+cu113 [pip3] torchmetrics==0.6.0 [pip3] torchtext==0.12.0 [pip3] torchvision==0.13.0+cu113 [conda] Could not collect
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 17 (11 by maintainers)
We do upcast in losses. But we upcast non float as we donāt currently support int dtype for losses. This is intentional.
See
https://github.com/pytorch/vision/blob/ec0e9e127ab1d6f2c91eb07ad638fa359063d546/torchvision/ops/ciou_loss.py#L48
and
https://github.com/pytorch/vision/blob/ec0e9e127ab1d6f2c91eb07ad638fa359063d546/torchvision/ops/diou_loss.py#L48
Both are
_
functions and arenāt exposed as such for use, so we donāt guarantee BC in either. All the added losses and ops are stable. (I hope Iām right @datumbox ?)I also think that the loss shouldnāt be zero if the boxes donāt intersect. I think thatās correct as a loss function should help to find the intersection. While Plain IoU (box_iou) should of course be 0 as boxes donāt intersect.
@oke-aditya Thanks! Please send a PR and update the tests accordingly to capture these issues going forwards.
I will patch this š
@senarvi Thanks for pointing this out. This is a bug.
To ensure BC and alignment with previous
*_box_iou()
methods, we needed to maintain the NxM feature. Note that the equivalent loss methods should be doing element-wise calculations to be efficient, so any bug fix shouldnāt be making the losses to estimate extra unnecessary info. This is something that was discussed on the original PRs.@oke-aditya @abhi-glitchhg @yassineAlouini Anyone interested in patching this?
AFAIK itās intended. Thatās the rationale behind generalized iou and other other iou methods such as distance and complete.
Giving negative values gives a better feedback to neural network (when used as loss) and as a metric is more relevant.
Thatās the advantage over vanilla box iou.
I would think that the fix is: