pytorch-image-models: CUDA out of memory when load model
I have train mobilenetv3_large_100 using 8 2080Ti GPU, and the batch size is 128, which means 128 * 8 =1024 pictures every batch. When I resumed the model, there was an “CUDA out of memory” error. However, when I trained it again from scratch, there wasn’t any error.
I noticed that your codes of “helper.py” has loaded the model in cpu, it should be the solution for this bug, but why this happend?
checkpoint = torch.load(checkpoint_path, map_location='cpu')
Another interesting problem is that I find the acc@1 is very low in the first few epochs(nearly random property), and the eval_loss even rises, why???
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 21 (9 by maintainers)
Commits related to this issue
- Add map_location='cpu' to ModelEma resume, should improve #72 — committed to huggingface/pytorch-image-models by rwightman 4 years ago
I am also seeing a similar problem, where training from scratch works but resuming will result in CUDA out of memory, so I am also dialing down the batch size by multiple(s) of 8.