pytorch-image-models: CUDA out of memory when load model

I have train mobilenetv3_large_100 using 8 2080Ti GPU, and the batch size is 128, which means 128 * 8 =1024 pictures every batch. When I resumed the model, there was an “CUDA out of memory” error. However, when I trained it again from scratch, there wasn’t any error. I noticed that your codes of “helper.py” has loaded the model in cpu, it should be the solution for this bug, but why this happend? checkpoint = torch.load(checkpoint_path, map_location='cpu')

Another interesting problem is that I find the acc@1 is very low in the first few epochs(nearly random property), and the eval_loss even rises, why？？？

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 21 (9 by maintainers)

Commits related to this issue

Add map_location='cpu' to ModelEma resume, should improve #72 — committed to huggingface/pytorch-image-models by rwightman 4 years ago

Most upvoted comments

I am also seeing a similar problem, where training from scratch works but resuming will result in CUDA out of memory, so I am also dialing down the batch size by multiple(s) of 8.

pichuang1984 on Jan 13, 2020