LongLoRA: supervised fine tuning 7b GPU requirement - CUDA out of memory

Dear authors,

Thanks for sharing the amazing implementation for long text inference. I am trying out supervised fine tuning. I got CUDA out of memory when I was running with Llama-2-7b-chat-hf on a machine with GPU–A100 * 8, each with 40GB GPU memory. I can see the setting per_device_train_batch_size=1, low_rank_training=True and use_flash_attn=True. May I know what is the GPU memory requirement for 7b supervised fine tuning please? I can train fine tuning without problem, I think this is because the max_seq_length is much smaller than the supervised fine tuning. I can’t reduce the per_device_train_batch_size, as it is already 1. The only thing I can think of would be quantization. Any suggestions?

python torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.35 GiB (GPU 6; 39.59 GiB total capacity; 31.56 GiB already allocated; 3.09 GiB free; 33.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1519 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1520 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1521 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1522 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1523 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1524 closing signal SIGTERM WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1526 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 6 (pid: 1525) of binary: /root/cheng/LongLoRA/.venv/bin/python Traceback (most recent call last):

Thanks a lot, Cheng

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 22 (14 by maintainers)

Most upvoted comments

Hello @yukang2017 I made the pull request here - https://github.com/dvlab-research/LongLoRA/pull/101 . Please have a review to see if it is correct thing to do or not. Thanks.

@yukang2017 Hello Yukang, I will attach relevant information later when I do the pull request. You can have a review to see if it is right or not. By the way, supervised-fine-tune-qlora.py seems to train ok so far. It does take a lot of time, have not reached half at the moment, with a machine of GPU–A100 * 8, each with 40GB GPU memory. Thanks.

image image