retinanet-examples: Problem with the validation step during training: No detections!

Hi,

I’m getting the “No detections!” output message for the validation step while training a retinanet model. This problem started to occur after I updated to v0.2.5. It was working fine when I was using v0.2.3.

I’m training retinanet with my own custom dataset by fine tuning on the COCO checkpoints provided with the 19.04 release. The version I was using was v0.2.3. I later wanted to update to v0.2.5 and retrain the model to see if there would be any improvement. But with all of the parameters being the same, I’m getting “No detections!” message. I tried for both RN50FPN and MobileNet backbones and nothing changed.

Other things I’ve tried:

  • I tried to run inference on the trained model and it did return detection results along with COCO scores both for v0.2.3 and v0.2.5. So there seem to be nothing wrong with the training.
  • I tried to train with COCO dataset from the COCO checkpoint and got the same problem for v0.2.5. But it did return the detections when I ran the code in inference mode

Command I used to run training (Only difference is the train/val datasets)

odtk train retinanet_ResNet50FPN_COCO_test.pth --backbone ResNet50FPN --jitter 360 640 \
    --images /datasets/COCO/val2017 --annotations /datasets/COCO/annotations/instances_val2017.json \
    --val-images /datasets/COCO/val2017 --val-annotations /datasets/COCO/annotations/instances_val2017.json \
    --lr 0.0005 --classes 2 --batch 1 --iters 200000 --resize 640  \
    --fine-tune checkpoints/retinanet_rn50fpn.pth

For inference: odtk infer retinanet_ResNet50FPN_COCO_test.pth --images=/datasets/COCO/val2017 --annotations=/datasets/COCO/annotations/instances_val2017.json

For the docker setup I simply followed the instruction in the ReadMe file. HW: NVIDIA GeForce RTX 2080

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 26 (2 by maintainers)

Most upvoted comments

Hello, can confirm that this issue pops up with 20.06.

It happens as soon as backpropagation happens. Maybe the model sent for inference during validation has inaccurate weights.

But the model that gets saved after this call, with a separate infer works fine. Will look more into this issue, any ideas for a fix are welcome.