mmaction2: Custom Training of SpatioTemporal Model SlowaFast giving mAP 0.0
Tried to train the model with our custom data (over 200+ videos). After training it with 50 apochs. mAP was still 0.0 after every epoch while validation. Can you help me in this?
Note: For annotations, I’m using normalized x1,y1 (top left corner) x2,y2 (bottom-right corner). Is it correct format or I need to change it ?
Below is my custom config file:
custom_classes = [1, 2, 3, 4, 5]
num_classes = 6
model = dict(
type='FastRCNN',
backbone=dict(
type='ResNet3dSlowOnly',
depth=50,
pretrained=None,
pretrained2d=False,
lateral=False,
num_stages=4,
conv1_kernel=(1, 7, 7),
conv1_stride_t=1,
pool1_stride_t=1,
spatial_strides=(1, 2, 2, 1)),
roi_head=dict(
type='AVARoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor3D',
roi_layer_type='RoIAlign',
output_size=8,
with_temporal_pool=True),
bbox_head=dict(
type='BBoxHeadAVA',
in_channels=2048,
num_classes=6,
multilabel=True,
topk=(2, 3),
dropout_ratio=0.5)),
train_cfg=dict(
rcnn=dict(
assigner=dict(
type='MaxIoUAssignerAVA',
pos_iou_thr=0.9,
neg_iou_thr=0.9,
min_pos_iou=0.9),
sampler=dict(
type='RandomSampler',
num=32,
pos_fraction=1,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=1.0,
debug=False)),
test_cfg=dict(rcnn=dict(action_thr=0.002)))
dataset_type = 'AVADataset'
data_root = 'tools/data/SAI/rawframes'
anno_root = 'tools/data/SAI/Annotations'
ann_file_train = 'tools/data/SAI/Annotations/ava_format_train.csv'
ann_file_val = 'tools/data/SAI/Annotations/ava_format_test.csv'
label_file = 'tools/data/SAI/Annotations/action_list.pbtxt'
proposal_file_train = 'tools/data/SAI/Annotations/proposals_train.pkl'
proposal_file_val = 'tools/data/SAI/Annotations/proposals_test.pkl'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
dict(type='RawFrameDecode'),
dict(type='RandomRescale', scale_range=(256, 320)),
dict(type='RandomCrop', size=256),
dict(type='Flip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_bgr=False),
dict(type='FormatShape', input_format='NCTHW', collapse=True),
dict(type='Rename', mapping=dict(imgs='img')),
dict(type='ToTensor', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
dict(
type='ToDataContainer',
fields=[
dict(key=['proposals', 'gt_bboxes', 'gt_labels'], stack=False)
]),
dict(
type='Collect',
keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
meta_keys=['scores', 'entity_ids'])
]
val_pipeline = [
dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_bgr=False),
dict(type='FormatShape', input_format='NCTHW', collapse=True),
dict(type='Rename', mapping=dict(imgs='img')),
dict(type='ToTensor', keys=['img', 'proposals']),
dict(type='ToDataContainer', fields=[dict(key='proposals', stack=False)]),
dict(
type='Collect',
keys=['img', 'proposals'],
meta_keys=['scores', 'img_shape'],
nested=True)
]
data = dict(
videos_per_gpu=1,
workers_per_gpu=4,
val_dataloader=dict(
videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
train_dataloader=dict(
videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
test_dataloader=dict(
videos_per_gpu=1, workers_per_gpu=4, persistent_workers=False),
train=dict(
type='AVADataset',
ann_file='tools/data/SAI/Annotations/ava_format_train.csv',
pipeline=[
dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
dict(type='RawFrameDecode'),
dict(type='RandomRescale', scale_range=(256, 320)),
dict(type='RandomCrop', size=256),
dict(type='Flip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_bgr=False),
dict(type='FormatShape', input_format='NCTHW', collapse=True),
dict(type='Rename', mapping=dict(imgs='img')),
dict(
type='ToTensor',
keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
dict(
type='ToDataContainer',
fields=[
dict(
key=['proposals', 'gt_bboxes', 'gt_labels'],
stack=False)
]),
dict(
type='Collect',
keys=['img', 'proposals', 'gt_bboxes', 'gt_labels'],
meta_keys=['scores', 'entity_ids'])
],
label_file='tools/data/SAI/Annotations/action_list.pbtxt',
proposal_file='tools/data/SAI/Annotations/proposals_train.pkl',
person_det_score_thr=0.9,
num_classes=6,
custom_classes=[1, 2, 3, 4, 5],
data_prefix='tools/data/SAI/rawframes'),
val=dict(
type='AVADataset',
ann_file='tools/data/SAI/Annotations/ava_format_test.csv',
pipeline=[
dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_bgr=False),
dict(type='FormatShape', input_format='NCTHW', collapse=True),
dict(type='Rename', mapping=dict(imgs='img')),
dict(type='ToTensor', keys=['img', 'proposals']),
dict(
type='ToDataContainer',
fields=[dict(key='proposals', stack=False)]),
dict(
type='Collect',
keys=['img', 'proposals'],
meta_keys=['scores', 'img_shape'],
nested=True)
],
label_file='tools/data/SAI/Annotations/action_list.pbtxt',
proposal_file='tools/data/SAI/Annotations/proposals_test.pkl',
person_det_score_thr=0.9,
num_classes=6,
custom_classes=[1, 2, 3, 4, 5],
data_prefix='tools/data/SAI/rawframes'),
test=dict(
type='AVADataset',
ann_file='tools/data/SAI/Annotations/ava_format_test.csv',
pipeline=[
dict(type='SampleAVAFrames', clip_len=32, frame_interval=16),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_bgr=False),
dict(type='FormatShape', input_format='NCTHW', collapse=True),
dict(type='Rename', mapping=dict(imgs='img')),
dict(type='ToTensor', keys=['img', 'proposals']),
dict(
type='ToDataContainer',
fields=[dict(key='proposals', stack=False)]),
dict(
type='Collect',
keys=['img', 'proposals'],
meta_keys=['scores', 'img_shape'],
nested=True)
],
label_file='tools/data/SAI/Annotations/action_list.pbtxt',
proposal_file='tools/data/SAI/Annotations/proposals_test.pkl',
person_det_score_thr=0.9,
num_classes=6,
custom_classes=[1, 2, 3, 4, 5],
data_prefix='tools/data/SAI/rawframes'))
optimizer = dict(type='SGD', lr=0.025, momentum=0.9, weight_decay=1e-05)
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
lr_config = dict(
policy='step',
step=[10, 15],
warmup='linear',
warmup_by_epoch=True,
warmup_iters=5,
warmup_ratio=0.1)
total_epochs = 50
train_ratio = [1, 1]
checkpoint_config = dict(interval=1)
workflow = [('train', 1)]
evaluation = dict(interval=1, save_best='mAP@0.5IOU')
log_config = dict(interval=20, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './SAI/slowfast_context_kinetics_pretrained_r50_4x16x1_20e_ava_rgb'
load_from = 'https://download.openmmlab.com/mmaction/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb/slowfast_r50_4x16x1_256e_kinetics400_rgb_20200704-bcde7ed7.pth'
resume_from = None
find_unused_parameters = False
omnisource = False
module_hooks = []
gpu_ids = range(0, 1)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 28 (6 by maintainers)
Seems that there are some problems with your proposal file used for validation:
You can look at here: https://github.com/open-mmlab/mmaction2/blob/d5ab34805fe6a02bb98e4af158626a21790b6974/mmaction/datasets/ava_dataset.py#L283
Since your annotation is invalid, the program directly uses [0, 0, 1, 1] as the proposal, which leads to bad predictions.
For the correct format of annotations files, you can download a proposal file follow the guide in https://github.com/open-mmlab/mmaction2/tree/master/tools/data/ava, and mimick its format.
Besides, it seems that your model does learn something: the scores corresponding to the GT labels are relatively large.
OK, seems I found a potential problem: you set clip_len as 32, frame_interval as 16, which will cover a long temporal span (~17s), which doesn’t fix the pretrain weight and the characteristic of this task. Maybe you can set frame_interval to 2.
@kennymckormick good catch, we were generating proposals keys like “10x10,4” but the code expects a key with leading zeros in timestamp like “10x10,0004” It fixed the issue.
Thanks a lot for helping.
@memona008 i trained model on 3 classes.
Exclude file u can keep blank, its not mandatory.