vision: FastRCNNPredictor doesn't return prediction in evaluation

πŸ› Bug

Dear all,

I am doing object detection in an image with one class. After training, FastRCNNPredictor does not return anything in validation mode. I have followed this official tutorial https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html.

Thanks.

To Reproduce

Steps to reproduce the behavior:

I have created a custom dataset, this is one of the output:

tensor([[[0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1686, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1529],
          ...,
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490]],
 
         [[0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1294, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1137],
          ...,
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098]],
 
         [[0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1216, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1059],
          ...,
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059]]]),
 {'boxes': tensor([[315.0003, 213.5002, 626.0004, 329.5002]]),
  'labels': tensor([0]),
  'image_id': tensor([1]),
  'area': tensor([36503.9961]),
  'iscrowd': tensor([0])})

To prove its correctness I have also visualized the bbox on the image:

image

Then I create a Dataloader:


dl = DataLoader(ds, batch_size=8, num_workers=4, collate_fn=lambda x: tuple(zip(*x)))

model = fasterrcnn_resnet50_fpn(num_classes=1).to(device)

params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)

Training works:

model.train()
for i in range(5):

    for images, targets in dl:
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k,v in t.items()} for t in targets]
        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())
        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

        print(losses)

Output:

tensor(0.6391, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.6329, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.6139, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5965, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5814, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5468, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5049, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.4502, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.3787, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.2502, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.1605, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0940, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0558, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0507, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0413, device='cuda:0', grad_fn=<AddBackward0>)

But, when I try to get a prediction I have no output:

model = model.eval()
with torch.no_grad():
    model = model.cuda()
    pred = model([ds[2][0].cuda()])

pred is

[{'boxes': tensor([], size=(0, 4)),
  'labels': tensor([], dtype=torch.int64),
  'scores': tensor([])}]

Thank you in advance

Expected behavior

The model should return a valid prediction.

Environment

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 430.50
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] efficientnet-pytorch==0.5.1
[pip] msgpack-numpy==0.4.3.2
[pip] numpy==1.17.4
[pip] PytorchStorage==0.0.0
[pip] torch==1.4.0
[pip] torchbearer==0.5.3
[pip] torchlego==0.0.0
[pip] torchsummary==1.5.1
[pip] torchvision==0.5.0
[conda] _pytorch_select           0.2                       gpu_0  
[conda] blas                      1.0                         mkl  
[conda] efficientnet-pytorch      0.5.1                    pypi_0    pypi
[conda] libblas                   3.8.0                    14_mkl    conda-forge
[conda] libcblas                  3.8.0                    14_mkl    conda-forge
[conda] liblapack                 3.8.0                    14_mkl    conda-forge
[conda] liblapacke                3.8.0                    14_mkl    conda-forge
[conda] mkl                       2019.4                      243  
[conda] mkl-service               2.3.0            py37he904b0f_0  
[conda] mkl_fft                   1.0.15           py37ha843d7b_0  
[conda] mkl_random                1.1.0            py37hd6b4f25_0  
[conda] pytorchstorage            0.0.0                    pypi_0    pypi
[conda] torch                     1.4.0                    pypi_0    pypi
[conda] torchbearer               0.5.3                    pypi_0    pypi
[conda] torchlego                 0.0.0                    pypi_0    pypi
[conda] torchsummary              1.5.1                    pypi_0    pypi
[conda] torchvision               0.5.0                    pypi_0    pypi

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (6 by maintainers)

Commits related to this issue

Most upvoted comments

@FrancescoSaverioZuppichini I think I see the issue: the label for your object is 0, but Faster R-CNN considers value 0 as background. If you make the label be 1, it should work fine.

This is illustrated in the detection tutorial you mentioned, see the dataset line:

# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)

But I agree it can be a bit tricky to spot this. I would happily accept a PR improving the documentation mentioning that the labels should start at 1 and that 0 is treated as background.

There is still an error in the documentation. If you have 3 classes in your dataset, and you have no background class in your dataset, you have to specify that num_classes=4 instead of num_classes=3. So, your labels would only contain 1, 2, and 3. However, you need to indicate that there is a non-existent class 0 by specifying the number of classes is equal to four.

If you don’t, you will trigger an error: RuntimeError: CUDA error: device-side assert triggered

Thanks for the PR @FrancescoSaverioZuppichini !

@fmassa Thank you, it works! πŸ₯³πŸ₯³

I will definitely create a PR and improve the doc over the weekend