accelerate: Conflict between notebook_launcher function and bitsandbytes package.
When I was fine-tuning baichuan13b, I found a problem. If I want to use accelerate for training, if I import the bitsandbytes package, an error will be reported when using accelerate for training.
This is because the notebook_launcher function checks torch.cuda.is_initialized(). If this variable is true, an error will be reported:
ckpt_path = ‘baichuan13b_ner’
optimizer = bnb.optim.adamw.AdamW(peft_model.parameters(), lr=6e-05,is_paged=True) #‘paged_adamw’
keras_model = KerasModel(peft_model, loss_fn =None, optimizer=optimizer)
keras_model.load_ckpt(ckpt_path)
keras_model.fit_ddp(num_processes=2, train_data=dl_train, val_data=dl_val, epochs=100, patience=10, monitor=‘val_loss’, mode=‘min’, ckpt_path=ckpt_path)
ValueError Traceback (most recent call last) Cell In[30], line 12 9 keras_model.load_ckpt(ckpt_path) 11 # 使用多GPU训练 —> 12 keras_model.fit_ddp(num_processes=2, 13 train_data=dl_train, 14 val_data=dl_val, 15 epochs=100, 16 patience=10, 17 monitor=‘val_loss’, 18 mode=‘min’, 19 ckpt_path=ckpt_path)
File ~/anaconda3/envs/baichuan13b/lib/python3.9/site-packages/torchkeras/kerasmodel.py:282, in KerasModel.fit_ddp(self, num_processes, train_data, val_data, epochs, ckpt_path, patience, monitor, mode, callbacks, plot, wandb, quiet, mixed_precision, cpu, gradient_accumulation_steps) 279 from accelerate import notebook_launcher 280 args = (train_data,val_data,epochs,ckpt_path,patience,monitor,mode, 281 callbacks,plot,wandb,quiet,mixed_precision,cpu,gradient_accumulation_steps) –> 282 notebook_launcher(self.fit, args, num_processes=num_processes)
File ~/anaconda3/envs/baichuan13b/lib/python3.9/site-packages/accelerate/launchers.py:116, in notebook_launcher(function, args, num_processes, mixed_precision, use_port) 113 from torch.multiprocessing.spawn import ProcessRaisedException 115 if len(AcceleratorState._shared_state) > 0: –> 116 raise ValueError( 117 "To launch a multi-GPU training from your notebook, the
Acceleratorshould only be initialized " 118 "inside your training function. Restart your notebook and make sure no cells initializes an " 119 “Accelerator.” 120 ) 122 if torch.cuda.is_initialized(): 123 raise ValueError( 124 "To launch a multi-GPU training from your notebook, you need to avoid running any instruction " 125 "usingtorch.cudain any cell. Restart your notebook and make sure no cells use any CUDA " 126 “function.” 127 )ValueError: To launch a multi-GPU training from your notebook, the
Acceleratorshould only be initialized inside your training function. Restart your notebook and make sure no cells initializes anAccelerator.
But as long as the bitsandbytes related package is imported, torch.cuda.is_initialized() will be set to true. So it may not be possible to run multiple cards in a notebook. I also tried to put it in the py file, but because the model is a quantized version, it will report an error that the 8-bit moel cannot run on multiple cards.
How should this be resolved?
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 19
Will be solved via https://github.com/huggingface/accelerate/pull/1833, @AisingioroHao0 you can do
pip install git+https://github.com/huggingface/accelerate@fix-cudauntil it’s merged, thenpip install git+https://github.com/huggingface/accelerateafterI can’t recreate this happening upon importing
notebook_launcher, it stems frombnband a PR is up now. To solve this you can try installing accelerate viapip install git+https://github.com/huggingface/accelerate@bnb-import@muellerzr I tried, if I uninstall bitsandbytes, at least I import from accelerate no longer initialize cuda. But it looks like there are other packages out there." RuntimeError: CUDA has been initialized before the
notebook_launchercould create a forked subprocess. This likely stems from an outside import causing issues once thenotebook_launcher()is called. Please review your imports and test them when running the ‘notebook_launcher()’ to identify which one is problematic." Let me check@muellerzr Hello, I have a similar problem. And I found that when I import anything from accelerate, torch.cuda.is_initialized() becomes True. Or even “from accelerate import notebook_launcher”. Is this a bug?