vision: distance_box_iou() and complete_box_iou() don't work if both sets don't have the same number of boxes

šŸ› Describe the bug

*box_iou() functions should return a matrix of results for every possible pair (box1, box2), where box1 is a box from boxes1 and box2 is a box from boxes2. box_iou() and generalized_box_iou() work this way, i.e. if boxes1 is an Nx4 matrix and boxes2 is an Mx4 matrix, the result is an NxM matrix. The recently added distance_box_iou() and complete_box_iou() don’t work if there aren’t as many boxes in boxes1 and boxes2.

import torch
from torchvision.ops import box_iou, generalized_box_iou, distance_box_iou, complete_box_iou

N = 5
M = 6
boxes1 = torch.rand((N, 4))
boxes2 = torch.rand((M, 4))
print(box_iou(boxes1, boxes2).shape)  # torch.Size([5, 6])
print(generalized_box_iou(boxes1, boxes2).shape)  # torch.Size([5, 6])
print(distance_box_iou(boxes1, boxes2).shape)  # RuntimeError
print(complete_box_iou(boxes1, boxes2).shape)  # RuntimeError

When running the above code, distance_box_iou() and complete_box_iou() will fail with a RuntimeError. The output is below:

torch.Size([5, 6])
torch.Size([5, 6])
Traceback (most recent call last):
  File ".../test.py", line 10, in <module>
    print(distance_box_iou(boxes1, boxes2).shape)  # RuntimeError
  File ".../lib/python3.9/site-packages/torchvision/ops/boxes.py", line 361, in distance_box_iou
    diou, _ = _box_diou_iou(boxes1, boxes2)
  File ".../lib/python3.9/site-packages/torchvision/ops/boxes.py", line 378, in _box_diou_iou
    centers_distance_squared = (_upcast(x_p - x_g) ** 2) + (_upcast(y_p - y_g) ** 2)
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 0

This is not caught by the unit tests, because there’s no such case where there’s a different number of boxes in the two sets.

The problem is in _box_diou_iou(). It looks like iou and diagonal_distance_squared are calculated for every possible pair (by adding an empty dimension), but centers_distance_squared is not.

As a side note, I personally feel it’s confusing that these functions produce the output for every possible pair. By convention, PyTorch functions produce element-wise results. For example, torch.add(boxes1, boxes2) only works if boxes1 and boxes2 contain the same number of boxes. If you want a pair-wise addition, you can easily call torch.add(boxes1[:, None, :], boxes2). The fact that *box_iou() functions produce pair-wise results makes the implementation complicated. And the only way to get element-wise results is calling box_iou(boxes1, boxes2).diagonal(), which is inefficient.

Versions

PyTorch version: 1.12.0+cu113 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1650 Ti with Max-Q Design Nvidia driver version: 516.59 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] mypy==0.950 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.3 [pip3] pytorch-lightning==1.6.5 [pip3] pytorch-lightning-bolts==0.2.5 [pip3] pytorch-quantization==2.1.2 [pip3] torch==1.12.0+cu113 [pip3] torchmetrics==0.6.0 [pip3] torchtext==0.12.0 [pip3] torchvision==0.13.0+cu113 [conda] Could not collect

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 17 (11 by maintainers)

Most upvoted comments

_box_inter_union() upcasts the box coordinates before calculating the box and intersection areas. _loss_inter_union() does not. Is this intentional?

We do upcast in losses. But we upcast non float as we don’t currently support int dtype for losses. This is intentional.

See

https://github.com/pytorch/vision/blob/ec0e9e127ab1d6f2c91eb07ad638fa359063d546/torchvision/ops/ciou_loss.py#L48

and

https://github.com/pytorch/vision/blob/ec0e9e127ab1d6f2c91eb07ad638fa359063d546/torchvision/ops/diou_loss.py#L48

_box_inter_union() uses clamp(min=0) to make sure that the intersection is zero if the boxes don’t intersect. _loss_inter_union() explicitly checks that the width and the height are positive. Does this make _loss_inter_union() more stable? (I don’t see how.)

Both are _ functions and aren’t exposed as such for use, so we don’t guarantee BC in either. All the added losses and ops are stable. (I hope I’m right @datumbox ?)

I also think that the loss shouldn’t be zero if the boxes don’t intersect. I think that’s correct as a loss function should help to find the intersection. While Plain IoU (box_iou) should of course be 0 as boxes don’t intersect.

@oke-aditya Thanks! Please send a PR and update the tests accordingly to capture these issues going forwards.

I will patch this šŸ˜„

@senarvi Thanks for pointing this out. This is a bug.

To ensure BC and alignment with previous *_box_iou() methods, we needed to maintain the NxM feature. Note that the equivalent loss methods should be doing element-wise calculations to be efficient, so any bug fix shouldn’t be making the losses to estimate extra unnecessary info. This is something that was discussed on the original PRs.

@oke-aditya @abhi-glitchhg @yassineAlouini Anyone interested in patching this?

AFAIK it’s intended. That’s the rationale behind generalized iou and other other iou methods such as distance and complete.

Giving negative values gives a better feedback to neural network (when used as loss) and as a metric is more relevant.

That’s the advantage over vanilla box iou.

I would think that the fix is:

centers_distance_squared = (_upcast(x_p[:, None] - x_g) ** 2) + (_upcast(y_p[:, None] - y_g) ** 2)