accelerate: Conflict between notebook_launcher function and bitsandbytes package.

When I was fine-tuning baichuan13b, I found a problem. If I want to use accelerate for training, if I import the bitsandbytes package, an error will be reported when using accelerate for training. This is because the notebook_launcher function checks torch.cuda.is_initialized(). If this variable is true, an error will be reported:

ckpt_path = ‘baichuan13b_ner’

optimizer = bnb.optim.adamw.AdamW(peft_model.parameters(), lr=6e-05,is_paged=True) #‘paged_adamw’

keras_model = KerasModel(peft_model, loss_fn =None, optimizer=optimizer)

keras_model.load_ckpt(ckpt_path)

keras_model.fit_ddp(num_processes=2, train_data=dl_train, val_data=dl_val, epochs=100, patience=10, monitor=‘val_loss’, mode=‘min’, ckpt_path=ckpt_path)

ValueError Traceback (most recent call last) Cell In[30], line 12 9 keras_model.load_ckpt(ckpt_path) 11 # 使用多GPU训练 —> 12 keras_model.fit_ddp(num_processes=2, 13 train_data=dl_train, 14 val_data=dl_val, 15 epochs=100, 16 patience=10, 17 monitor=‘val_loss’, 18 mode=‘min’, 19 ckpt_path=ckpt_path)

File ~/anaconda3/envs/baichuan13b/lib/python3.9/site-packages/torchkeras/kerasmodel.py:282, in KerasModel.fit_ddp(self, num_processes, train_data, val_data, epochs, ckpt_path, patience, monitor, mode, callbacks, plot, wandb, quiet, mixed_precision, cpu, gradient_accumulation_steps) 279 from accelerate import notebook_launcher 280 args = (train_data,val_data,epochs,ckpt_path,patience,monitor,mode, 281 callbacks,plot,wandb,quiet,mixed_precision,cpu,gradient_accumulation_steps) –> 282 notebook_launcher(self.fit, args, num_processes=num_processes)

File ~/anaconda3/envs/baichuan13b/lib/python3.9/site-packages/accelerate/launchers.py:116, in notebook_launcher(function, args, num_processes, mixed_precision, use_port) 113 from torch.multiprocessing.spawn import ProcessRaisedException 115 if len(AcceleratorState._shared_state) > 0: –> 116 raise ValueError( 117 "To launch a multi-GPU training from your notebook, the Accelerator should only be initialized " 118 "inside your training function. Restart your notebook and make sure no cells initializes an " 119 “Accelerator.” 120 ) 122 if torch.cuda.is_initialized(): 123 raise ValueError( 124 "To launch a multi-GPU training from your notebook, you need to avoid running any instruction " 125 "using torch.cuda in any cell. Restart your notebook and make sure no cells use any CUDA " 126 “function.” 127 )

ValueError: To launch a multi-GPU training from your notebook, the Accelerator should only be initialized inside your training function. Restart your notebook and make sure no cells initializes an Accelerator.

But as long as the bitsandbytes related package is imported, torch.cuda.is_initialized() will be set to true. So it may not be possible to run multiple cards in a notebook. I also tried to put it in the py file, but because the model is a quantized version, it will report an error that the 8-bit moel cannot run on multiple cards.

How should this be resolved?

About this issue

Original URL
State: closed
Created a year ago
Comments: 19

Most upvoted comments

Will be solved via https://github.com/huggingface/accelerate/pull/1833, @AisingioroHao0 you can do pip install git+https://github.com/huggingface/accelerate@fix-cuda until it’s merged, then pip install git+https://github.com/huggingface/accelerate after

muellerzr on Aug 10, 2023

I can’t recreate this happening upon importing notebook_launcher, it stems from bnb and a PR is up now. To solve this you can try installing accelerate via pip install git+https://github.com/huggingface/accelerate@bnb-import

muellerzr on Aug 4, 2023

@muellerzr I tried, if I uninstall bitsandbytes, at least I import from accelerate no longer initialize cuda. But it looks like there are other packages out there." RuntimeError: CUDA has been initialized before the notebook_launcher could create a forked subprocess. This likely stems from an outside import causing issues once the notebook_launcher() is called. Please review your imports and test them when running the ‘notebook_launcher()’ to identify which one is problematic." Let me check

aihao2000 on Aug 4, 2023

@muellerzr Hello, I have a similar problem. And I found that when I import anything from accelerate, torch.cuda.is_initialized() becomes True. Or even “from accelerate import notebook_launcher”. Is this a bug?

aihao2000 on Aug 4, 2023