SlowFast: Cannot reproduce the result on AVA

Hi, thank you for your great code base.

Now I’m trying to reproduce the result on AVA claimed in paper and tech report. However, I cannot reproduce the result. I chose configs/AVA/c2/SLOWFAST_32x2_R101_50_50.yaml configuration file for training. The modifications I did to this config file are: (Here, the commented are original settings.)

TRAIN:
  ENABLE: True #False
  DATASET: ava
  BATCH_SIZE: 80 #16
  EVAL_PERIOD: 2 # 1
  CHECKPOINT_PERIOD: 1
  AUTO_RESUME: True
  CHECKPOINT_FILE_PATH: path to pretrain model downloaded from the third entry in this table(https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md#ava)
  CHECKPOINT_TYPE: caffe2 # pytorch
DETECTION:
  ENABLE: True
  ALIGNED: True #False
#SOLVER:
#  MOMENTUM: 0.9
#  WEIGHT_DECAY: 1e-7
#  OPTIMIZING_METHOD: sgd
SOLVER:
  BASE_LR: 0.1
  LR_POLICY: steps_with_relative_lrs
  STEPS: [0, 20, 30]
  LRS: [1, 0.1, 0.01, 0.001]
  MAX_EPOCH: 40
  MOMENTUM: 0.9
  WEIGHT_DECAY: 1e-7
  WARMUP_EPOCHS: 5
  WARMUP_START_LR: 0.000125
  OPTIMIZING_METHOD: sgd

As you can see, I modified the learning policy from cosine to steps_with_relative_lrs, which is stated in paper, with reference to the settings in https://github.com/facebookresearch/SlowFast/blob/a8a47ced376a681e76d8b904e7be76d67fe999b3/configs/AVA/SLOWFAST_32x2_R50_SHORT.yaml#L52-L62 However, the result is not good.

Below is the picture of training loss and mAP on val set, 批注 2020-03-10 134205

From the picture, we can find that the mAP on Val set saturates around 26 .0 quickly and cannot reach the value (29.0) stated in paper.

How do you think about this, is this because of the STEPS parameter I set that cause overfitting?

Thank you!

About this issue

Original URL
State: open
Created 4 years ago
Comments: 22 (9 by maintainers)

Most upvoted comments

Hi, @takatosp1 @feichtenhofer.

I notice that in the paper of SlowFast, SlowFast-R101, 8x8, K600 achieves 29.0 on AVA-v2.2, and in the paper of X3D, the performance is reported as 27.4 for SlowFast-R101, 8x8, K600. What is the difference between their training and inference settings?

BoPang1996 on Nov 9, 2020

@chaoyuaw Training R101 model is time-consuming, I will report the performance of SLOWFAST_32x2_R101_50_50 once I finish the training. I have verified the configuration configs/AVA/SLOWFAST_32x2_R50_SHORT.yaml and reported the performance in https://github.com/facebookresearch/SlowFast/issues/112.

If some guys meet the similar problem, the training log and test log are listed here for reference.(I will update the performance for R101 later)

architecture	depth	Pretrain Model	frame length x sample rate	MAP	AVA version	Log	config
C2D	R50	C2D(8x8)-Kinetics 400	4 x 16	20.69	2.2	[train log] [test log]	workspace/ava/c2d.ava.8x8.res50.short
SlowFast	R50	SlowFast(8x8)-Kinetics 400	S:8 x 8;F:32x2	24.9	2.2	[train log][test log]	workspace/ava/slowfast.ava.32x2.res50.short

tonysy on Apr 1, 2020

Hi @takatosp1 , thank you for your reply.

I’m training my model on a machine with 8 Quadro RTX 6000 GPUs (24G memory).

I noticed you’ve change the batch size, as well as the ALIGN, they all affect the performance significantly.

I noticed that in https://github.com/facebookresearch/SlowFast/blob/a8a47ced376a681e76d8b904e7be76d67fe999b3/configs/AVA/SLOWFAST_32x2_R50_SHORT.yaml#L1-L9

that batchsize is set to 64, since my batchsize is 80, now I’m doing another training with

DETECTION:
  ENABLE: True
  ALIGNED: True
SOLVER:
  BASE_LR: 0.125 #0.1
  LR_POLICY: steps_with_relative_lrs
  STEPS: [0, 8, 12]
  LRS: [1, 0.1, 0.01, 0.001]
  MAX_EPOCH: 16
  MOMENTUM: 0.9
  WEIGHT_DECAY: 1e-7
  WARMUP_EPOCHS: 5
  WARMUP_START_LR: 0.000125
  OPTIMIZING_METHOD: sgd

with the

  BASE_LR: 0.125 #0.1
  STEPS: [0, 8, 12]
  LRS: [1, 0.1, 0.01, 0.001]
  MAX_EPOCH: 16

calculated according to https://github.com/facebookresearch/SlowFast/blob/a8a47ced376a681e76d8b904e7be76d67fe999b3/configs/AVA/SLOWFAST_32x2_R50_SHORT.yaml#L52-L62 are these settings correct?

Besides, how is ALIGN going to affect the performance? Are the models claimed in paper trained with ALIGN set to True or False?

btw, note that, for the scripts under /c2/, they are script designed to use for inference.

Thank you for reminding. However, if so, there are no scripts for training with R101 model, right? It it OK if I modify the inference script for training?

Best regards!

taosean on Mar 11, 2020

Hi Thanks for playing with PySlowFast! I noticed you’ve change the batch size, as well as the ALIGN, they all affect the performance significantly. Also I am curious that how could you run a SlowFast R101 model with batch size of 80? Could you share me what are the number of GPUs you use?

btw, note that, for the scripts under /c2/, they are script designed to use for inference.

haooooooqi on Mar 10, 2020

We haven’t release R101 training schedule and will do it later. Please not use Recipe under c2 for training since it is for inference only

haooooooqi on Apr 3, 2020

@chaoyuaw Thanks for your reply, I have resolved the performance issue on my side. (#112)

tonysy on Mar 31, 2020

We will release the R101 models later. (The link says coming soon) 📦

We were discussion the performance on SlowFast50 models.

Thanks

I’m a bit confused. Why do we depend on the “coming soon” models? To reproduce the AVA mAP (e.g., the “29.1” and “29.4” in this table), shouldn’t we just use the pre-trained models provided by the “Pretrain Model” column (red box on the attached image) in the table? Screen Shot 2020-03-31 at 10 47 40 AM

chaoyuaw on Mar 31, 2020

@takatosp1 Thanks. Also, can you explain the meaning of * in here? What’s the difference of used training region proposals?

leviswind on Mar 31, 2020