pytorch-image-models: CUDA out of memory when load model

I have train mobilenetv3_large_100 using 8 2080Ti GPU, and the batch size is 128, which means 128 * 8 =1024 pictures every batch. When I resumed the model, there was an “CUDA out of memory” error. However, when I trained it again from scratch, there wasn’t any error. I noticed that your codes of “helper.py” has loaded the model in cpu, it should be the solution for this bug, but why this happend? checkpoint = torch.load(checkpoint_path, map_location='cpu')

Another interesting problem is that I find the acc@1 is very low in the first few epochs(nearly random property), and the eval_loss even rises, why??? image

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 21 (9 by maintainers)

Commits related to this issue

Most upvoted comments

I am also seeing a similar problem, where training from scratch works but resuming will result in CUDA out of memory, so I am also dialing down the batch size by multiple(s) of 8.