yolact: Warning: Moving average ignored a value of inf

Hi, im try to train yolact to detect cars with images from COCO. I take all of the images with cars in it and make dataset from them. My config look like this: ` only_cars_coco2017_dataset = dataset_base.copy({ ‘name’: ‘cars COCO 2017’,

# Training images and annotations
'train_info': '/home/ws/data/COCO/only_cars_train.json',
'train_images':   '/home/ws/data/COCO/train/train2017/',

# Validation images and annotations.
'valid_info': '/home/ws/data/COCO/only_cars_val.json',
'valid_images':   '/home/ws/data/COCO/val/val2017/',

'class_names': ('car'),
'label_map': {1: 1}

})

yolact_im200_coco_cars_config = yolact_base_config.copy({ ‘name’: ‘yolact_im200_coco_cars’,

# Dataset stuff
'dataset': only_cars_coco2017_dataset,
'num_classes': len(only_cars_coco2017_dataset.class_names) + 1,

'masks_to_train': 20,
'max_num_detections': 20,
'max_size': 200,
'backbone': yolact_base_config.backbone.copy({
    'pred_scales': [[int(x[0] / yolact_base_config.max_size * 200)] for x in yolact_base_config.backbone.pred_scales],
}),

}) `

After a few iterations, my loss going very high…

Can somwone help me with this?

Update: Also if im train with full COCO dataset i get the same error…

About this issue

Most upvoted comments

@jasonkena, Thanks, Eval now working with AMP.

Sorry @Auth0rM0rgan, I believe you were right. I did not initialize amp within eval.py, which is why the problem only showed up during inference.

@Rm1n90, to fix it I believe you have to add

if args.cuda:
    net = net.cuda()
if cfg.use_amp:
    from apex import amp

    if not args.cuda:
        raise ValueError("amp must be used with CUDA")
    net = amp.initialize(net, opt_level="O1")

before net = CustomDataParallel(net).cuda() (https://github.com/jasonkena/yolact/blob/e1a949445dc0c57eb7c8f10470630faff0ce22e2/eval.py#L913)

I haven’t tested it, can you tell me how it turns out?

Can you try cloning my branch on a completely new directory? @sdimantsd and I didn’t get any of your errors running it out of the box.

According to the YOLACT++ paper, the Mask-Rescoring loss improves the performance by 1 mAP.

Nice catch!

The Gradient Overflow warning is ok, as long as the loss scaler doesn’t become 0. The warning means that it is scaling the loss, so it doesn’t become infinite.

Yup, it’s perfectly normal, it’s Apex’s AMP’s Dynamic Loss Scaling doing its magic.

Hey @jasonkena,

I’m going to train the model with 16-bit precision and will let you know the performance. Hope I can see improvement in the inference time as well

OK, thx

Thanks! i will try this next week 😃