vision: Problems training Faster-RCNN from pretrained backbone
Is there any recommendation to train Faster-RCNN starting from the pretrained backbone? I’m using VOC 2007 dataset and I’m able to do transfer learning starting from:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes=21)
Using COCO pretrained ‘fasterrcnn_resnet50_fpn’ i’m able to obtain an mAP of 79% on VOC 2007 test set. Problems arise when i try to train from scratch using only the pretrained backbone:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes=21)
I have been trying to train this model for weeks but the highest mAP i got was 63% (again on test set).
Now, i know that training from scratch is harder, but i really would like to know how to set the training parameters to obtain a decent accuracy, in the future i may want to change the backbone and chances are that i will be not able to find a pretrained faster-rcnn on which i can do transfer learning.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 44 (16 by maintainers)
@hktxt FYI i can get easily 72% mAP using the example provided in FasterRCNN source code using
mobilenet_v2
as backbone:no need to modify the
BoxHead
.@fmassa I found out what my main problem was, I was using the
val
set for validation only. However, to get good result on PASCAL VOC 2007 you are supposed to usetrainval
all together. Also, thanks to @hktxt comment I got 66% accuracy training from scratch (just 3% less than the expected). If anyone is intereseted here the highlights:Backbone
Model
Dataset
The only aumentation i used was
RandomHorizontalFlip
.Parameters
Thanks @lpuglia for the PR!
I’ll have a closer look to the PR (and get it merged) once I’m back from holidays
@fmassa here is the pull request: https://github.com/pytorch/vision/pull/1216 it should work out of the box.
@lpuglia I think we should add one example with Pascal VOC somewhere. If you could send an initial PR, I could look into improving it and merging it in torchvision.
@fmassa I tried them both, the first actually decrease the accuracy for some reasons, the second makes no difference. I will train from scratch using COCO and then use transfer learning to see if i can get 70% on Pascal, thanks for the help!
@hktxt my advice is to make sure to have the visibility checks enabled and use the following class for conversion:
Also (I don’t know if this is useful yet) but make sure to have a 10022 image dataset flipping all the images. This is different from random flipping because you make sure that every image is shown to the network twice in different orientation per epoch. If you use this strategy you will need just 15 epoch to train the netwrork. Here is my code:
@fmassa removing the visibility check decreases the accuracy from 66 to 64%
@fmassa It was enabled the whole time, I don’t know how much did it influenced the training, I’m gonna repeat the test commenting it out and let you know (my guess is that it doesn’t change much)