demucs: CUDA out of memory while processing long tracks
Hello,
First of all – thank you for this great product. It works flawlessly using CPU.
I’m trying to process material faster by using a GPU on an AWS EC2 instance. Unfortunately, it terminates with the following error:
$ demucs audio.mp3 -d cuda
Selected model is a bag of 4 models. You will see that many progress bars per track.
Separated tracks will be stored in /home/ec2-user/separated/mdx_extra_q
Separating track audio.mp3
100%|██████████████████████████████████████████████████████████████████████| 4356.0/4356.0 [01:35<00:00, 45.42seconds/s]
Traceback (most recent call last):
File "/home/ec2-user/.local/bin/demucs", line 8, in <module>
sys.exit(main())
File "/home/ec2-user/.local/lib/python3.7/site-packages/demucs/separate.py", line 120, in main
overlap=args.overlap, progress=True)[0]
File "/home/ec2-user/.local/lib/python3.7/site-packages/demucs/apply.py", line 147, in apply_model
estimates += out
RuntimeError: CUDA out of memory. Tried to allocate 5.71 GiB (GPU 0; 14.76 GiB total capacity; 9.12 GiB already allocated; 4.33 GiB free; 9.15 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Info about the running environment:
- Python 3.7.10 and PyTorch 1.10.0
- AWS EC2 instance type: g4dn.xlarge
- Operating system and version:
Deep Learning AMI GPU PyTorch 1.10.0 (Amazon Linux 2) 20211115
- GPU Hardware:
NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
Driver Version: 470.57.02 CUDA Version: 11.4
15109MiB memory
Do the models require a card with more than 16 GB RAM? If that’s not your experience, can you share your hardware/software environment, so that I can retry. Thank you.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 33 (19 by maintainers)
This is AWESOME help @famzah, thanks for writing that PR! I would’ve helped but I don’t know Python super well haha…was reading your benchmarks and those tradeoffs b/w memory and processing time are definitely ones I’d sign up for
@adefossez, I’ve submitted PR #244 for the code which uses much less GPU VRAM at the expense of more regular RAM. Please review and if there’s anything to discuss, let’s comment in the PR.