mmaction2: Custom Training of SpatioTemporal Model SlowaFast giving mAP 0.0

Tried to train the model with our custom data (over 200+ videos). After training it with 50 apochs. mAP was still 0.0 after every epoch while validation. Can you help me in this?

Note: For annotations, I’m using normalized x1,y1 (top left corner) x2,y2 (bottom-right corner). Is it correct format or I need to change it ?

Below is my custom config file:


custom_classes = [1, 2, 3, 4, 5]
num_classes = 6
model = dict(
    type='FastRCNN',
    backbone=dict(
        type='ResNet3dSlowOnly',
        depth=50,
        pretrained=None,
        pretrained2d=False,
        lateral=False,
        num_stages=4,
        conv1_kernel=(1, 7, 7),
        conv1_stride_t=1,
        pool1_stride_t=1,
        spatial_strides=(1, 2, 2, 1)),
    roi_head=dict(
        type='AVARoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor3D',
            roi_layer_type='RoIAlign',
            output_size=8,
            with_temporal_pool=True),
        bbox_head=dict(
            type='BBoxHeadAVA',
            in_channels=2048,
            num_classes=6,
            multilabel=True,
            topk=(2, 3),
            dropout_ratio=0.5)),
    train_cfg=dict(
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssignerAVA',
                pos_iou_thr=0.9,
                neg_iou_thr=0.9,
                min_pos_iou=0.9),
            sampler=dict(
                type='RandomSampler',
                num=32,
                pos_fraction=1,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=1.0,
            debug=False)),
    test_cfg=dict(rcnn=dict(action_thr=0.002)))
dataset_type = 'AVADataset'
data_root = 'tools/data/SAI/rawframes'
anno_root = 'tools/data/SAI/Annotations'
ann_file_train = 'tools/data/SAI/Annotations/ava_format_train.csv'
ann_file_val = 'tools/data/SAI/Annotations/ava_format_test.csv'
label_file = 'tools/data/SAI/Annotations/action_list.pbtxt'
proposal_file_train = 'tools/data/SAI/Annotations/proposals_train.pkl'
proposal_file_val = 'tools/data/SAI/Annotations/proposals_test.pkl'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
    dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
    dict(type='RawFrameDecode'),
    dict(type='RandomRescale', scale_range=(256, 320)),
    dict(type='RandomCrop', size=256),
    dict(type='Flip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_bgr=False),
    dict(type='FormatShape', input_format='NCTHW', collapse=True),
    dict(type='Rename', mapping=dict(imgs='img')),
    dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
    dict(
        type='ToDataContainer',
        fields=[
            dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False)
        ]),
    dict(
        type='Collect',
        keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
        meta_keys=['scores', 'entity_ids'])
]
val_pipeline = [
    dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
    dict(type='RawFrameDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_bgr=False),
    dict(type='FormatShape', input_format='NCTHW', collapse=True),
    dict(type='Rename', mapping=dict(imgs='img')),
    dict(type='ToTensor', keys=['img', 'proposals']),
    dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]),
    dict(
        type='Collect',
        keys=['img', 'proposals'],
        meta_keys=['scores', 'img_shape'],
        nested=True)
]
data = dict(
    videos_per_gpu=1,
    workers_per_gpu=4,
    val_dataloader=dict(
        videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
    train_dataloader=dict(
        videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
    test_dataloader=dict(
        videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
    train=dict(
        type='AVADataset',
        ann_file='tools/data/SAI/Annotations/ava_format_train.csv',
        pipeline=[
            dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
            dict(type='RawFrameDecode'),
            dict(type='RandomRescale', scale_range=(256, 320)),
            dict(type='RandomCrop', size=256),
            dict(type='Flip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_bgr=False),
            dict(type='FormatShape', input_format='NCTHW', collapse=True),
            dict(type='Rename', mapping=dict(imgs='img')),
            dict(
                type='ToTensor',
                keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
            dict(
                type='ToDataContainer',
                fields=[
                    dict(
                        key=['proposals', 'gt_bboxes', 'gt_labels'],
                        stack=False)
                ]),
            dict(
                type='Collect',
                keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
                meta_keys=['scores', 'entity_ids'])
        ],
        label_file='tools/data/SAI/Annotations/action_list.pbtxt',
        proposal_file='tools/data/SAI/Annotations/proposals_train.pkl',
        person_det_score_thr=0.9,
        num_classes=6,
        custom_classes=[1, 2, 3, 4, 5],
        data_prefix='tools/data/SAI/rawframes'),
    val=dict(
        type='AVADataset',
        ann_file='tools/data/SAI/Annotations/ava_format_test.csv',
        pipeline=[
            dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
            dict(type='RawFrameDecode'),
            dict(type='Resize', scale=(-1, 256)),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_bgr=False),
            dict(type='FormatShape', input_format='NCTHW', collapse=True),
            dict(type='Rename', mapping=dict(imgs='img')),
            dict(type='ToTensor', keys=['img', 'proposals']),
            dict(
                type='ToDataContainer',
                fields=[dict(key='proposals', stack=False)]),
            dict(
                type='Collect',
                keys=['img', 'proposals'],
                meta_keys=['scores', 'img_shape'],
                nested=True)
        ],
        label_file='tools/data/SAI/Annotations/action_list.pbtxt',
        proposal_file='tools/data/SAI/Annotations/proposals_test.pkl',
        person_det_score_thr=0.9,
        num_classes=6,
        custom_classes=[1, 2, 3, 4, 5],
        data_prefix='tools/data/SAI/rawframes'),
    test=dict(
        type='AVADataset',
        ann_file='tools/data/SAI/Annotations/ava_format_test.csv',
        pipeline=[
            dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
            dict(type='RawFrameDecode'),
            dict(type='Resize', scale=(-1, 256)),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_bgr=False),
            dict(type='FormatShape', input_format='NCTHW', collapse=True),
            dict(type='Rename', mapping=dict(imgs='img')),
            dict(type='ToTensor', keys=['img', 'proposals']),
            dict(
                type='ToDataContainer',
                fields=[dict(key='proposals', stack=False)]),
            dict(
                type='Collect',
                keys=['img', 'proposals'],
                meta_keys=['scores', 'img_shape'],
                nested=True)
        ],
        label_file='tools/data/SAI/Annotations/action_list.pbtxt',
        proposal_file='tools/data/SAI/Annotations/proposals_test.pkl',
        person_det_score_thr=0.9,
        num_classes=6,
        custom_classes=[1, 2, 3, 4, 5],
        data_prefix='tools/data/SAI/rawframes'))
optimizer = dict(type='SGD', lr=0.025, momentum=0.9, weight_decay=1e-05)
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
lr_config = dict(
    policy='step',
    step=[10, 15],
    warmup='linear',
    warmup_by_epoch=True,
    warmup_iters=5,
    warmup_ratio=0.1)
total_epochs = 50
train_ratio = [1, 1]
checkpoint_config = dict(interval=1)
workflow = [('train', 1)]
evaluation = dict(interval=1, save_best='mAP@0.5IOU')
log_config = dict(interval=20, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './SAI/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb'
load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_256e_kinetics400_rgb_20200704-bcde7ed7.pth'
resume_from = None
find_unused_parameters = False
omnisource = False
module_hooks = []
gpu_ids = range(0, 1)


About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 28 (6 by maintainers)

Most upvoted comments

Seems that there are some problems with your proposal file used for validation:

You can look at here: https://github.com/open-mmlab/mmaction2/blob/d5ab34805fe6a02bb98e4af158626a21790b6974/mmaction/datasets/ava_dataset.py#L283

Since your annotation is invalid, the program directly uses [0, 0, 1, 1] as the proposal, which leads to bad predictions.

For the correct format of annotations files, you can download a proposal file follow the guide in https://github.com/open-mmlab/mmaction2/tree/master/tools/data/ava, and mimick its format.

Besides, it seems that your model does learn something: the scores corresponding to the GT labels are relatively large.

OK, seems I found a potential problem: you set clip_len as 32, frame_interval as 16, which will cover a long temporal span (~17s), which doesn’t fix the pretrain weight and the characteristic of this task. Maybe you can set frame_interval to 2.

Seems that there are some problems with your proposal file used for validation:

You can look at here:

https://github.com/open-mmlab/mmaction2/blob/d5ab34805fe6a02bb98e4af158626a21790b6974/mmaction/datasets/ava_dataset.py#L283

Since your annotation is invalid, the program directly uses [0, 0, 1, 1] as the proposal, which leads to bad predictions.

For the correct format of annotations files, you can download a proposal file follow the guide in https://github.com/open-mmlab/mmaction2/tree/master/tools/data/ava, and mimick its format.

Besides, it seems that your model does learn something: the scores corresponding to the GT labels are relatively large.

@kennymckormick good catch, we were generating proposals keys like “10x10,4” but the code expects a key with leading zeros in timestamp like “10x10,0004” It fixed the issue.

Thanks a lot for helping.

@memona008 i trained model on 3 classes.

Exclude file u can keep blank, its not mandatory.