Co-DETR: The problems that arise with distributed training using the dino model do not occur with deformable detr

RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Comments: 20

Most upvoted comments

@zimenglan-sysu-512 Hi, we may release the ViT model and config in the future. The model settings are presented in the appendix of our paper. We used 56 A100 80G GPUs with img_per_gpu set to 4 during pretraining. If you want to train this model with smaller graphics cards, you may need to use FSDP.