yolact: Warning: Moving average ignored a value of inf

Hi, im try to train yolact to detect cars with images from COCO. I take all of the images with cars in it and make dataset from them. My config look like this: ` only_cars_coco2017_dataset = dataset_base.copy({ ‘name’: ‘cars COCO 2017’,

# Training images and annotations
'train_info': '/home/ws/data/COCO/only_cars_train.json',
'train_images':   '/home/ws/data/COCO/train/train2017/',

# Validation images and annotations.
'valid_info': '/home/ws/data/COCO/only_cars_val.json',
'valid_images':   '/home/ws/data/COCO/val/val2017/',

'class_names': ('car'),
'label_map': {1: 1}

})

yolact_im200_coco_cars_config = yolact_base_config.copy({ ‘name’: ‘yolact_im200_coco_cars’,

# Dataset stuff
'dataset': only_cars_coco2017_dataset,
'num_classes': len(only_cars_coco2017_dataset.class_names) + 1,

'masks_to_train': 20,
'max_num_detections': 20,
'max_size': 200,
'backbone': yolact_base_config.backbone.copy({
    'pred_scales': [[int(x[0] / yolact_base_config.max_size * 200)] for x in yolact_base_config.backbone.pred_scales],
}),

}) `

After a few iterations, my loss going very high…

Can somwone help me with this?

Update: Also if im train with full COCO dataset i get the same error…

About this issue

Original URL
State: open
Created 4 years ago
Comments: 60

Most upvoted comments

@jasonkena, Thanks, Eval now working with AMP.

Rm1n90 on May 28, 2020

Sorry @Auth0rM0rgan, I believe you were right. I did not initialize amp within eval.py, which is why the problem only showed up during inference.

@Rm1n90, to fix it I believe you have to add

if args.cuda:
    net = net.cuda()
if cfg.use_amp:
    from apex import amp

    if not args.cuda:
        raise ValueError("amp must be used with CUDA")
    net = amp.initialize(net, opt_level="O1")

before net = CustomDataParallel(net).cuda() (https://github.com/jasonkena/yolact/blob/e1a949445dc0c57eb7c8f10470630faff0ce22e2/eval.py#L913)

I haven’t tested it, can you tell me how it turns out?

jasonkena on May 28, 2020

Can you try cloning my branch on a completely new directory? @sdimantsd and I didn’t get any of your errors running it out of the box.

According to the YOLACT++ paper, the Mask-Rescoring loss improves the performance by 1 mAP.

jasonkena on Mar 29, 2020

Nice catch!

The Gradient Overflow warning is ok, as long as the loss scaler doesn’t become 0. The warning means that it is scaling the loss, so it doesn’t become infinite.

Yup, it’s perfectly normal, it’s Apex’s AMP’s Dynamic Loss Scaling doing its magic.

jasonkena on Mar 23, 2020

Hey @jasonkena,

I’m going to train the model with 16-bit precision and will let you know the performance. Hope I can see improvement in the inference time as well

Auth0rM0rgan on Mar 21, 2020

OK, thx

sdimantsd on Mar 3, 2020

Thanks! i will try this next week 😃

sdimantsd on Feb 27, 2020