bitsandbytes: [Bug] Exception: cublasLt ran into an error! during fine-tuning LLM in 8bit mode
Problem
Hello, I’m getting this weird cublasLt error on a lambdalabs H100 with cuda 118, pytorch 2.0.1, python3.10 Miniconda while trying to fine-tune a 3B param open-llama using LORA with 8bit loading. This only happens if we turn on 8bit loading. Lora alone or 4bit loading (qlora) works.
The same commands did work 2 weeks ago and stopped working a week ago.
I’ve tried bitsandbytes version 0.39.0 and 0.39.1 as prior versions don’t work with H100. Source gives me a different issue as mentioned in Env section.
Expected
No error
Reproduce
Setup Miniconda then follow https://github.com/OpenAccess-AI-Collective/axolotl 's readme on lambdalabs and run the default open llama lora config.
Trace
0.39.0
File "/home/ubuntu/axolotl/scripts/finetune.py", line 352, in <module>
fire.Fire(train)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/axolotl/scripts/finetune.py", line 337, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 1531, in train
return inner_training_loop(
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 1795, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2640, in training_step
loss = self.compute_loss(model, inputs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 2665, in compute_loss
outputs = model(**inputs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/utils/operations.py", line 553, in forward
return model_forward(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/utils/operations.py", line 541, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/peft/peft_model.py", line 827, in forward
return self.base_model(
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 691, in forward
outputs = self.model(
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 579, in forward
layer_outputs = decoder_layer(
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 293, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 195, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/peft/tuners/lora.py", line 942, in forward
result = super().forward(x)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 402, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 562, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 400, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1781, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
Env
python -m bitsandbytes
-
on main branch: I get error same as here https://github.com/TimDettmers/bitsandbytes/issues/382
-
on 0.39.0
bin /home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12
.0'] files: {PosixPath('/home/ubuntu/miniconda3/envs/py310/lib/libcudart.so.11.0'), PosixPath('/home/ubuntu/miniconda3/envs/py310/lib/libcudart.so')}.. We'll flip a coin and try one of
these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0
'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /home/ubuntu/miniconda3/envs/py310/lib/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 9.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++ ANACONDA CUDA PATHS ++++++++++++++++++++
/home/ubuntu/miniconda3/envs/py310/lib/libcudart.so
/home/ubuntu/miniconda3/envs/py310/lib/stubs/libcuda.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda_linalg.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libc10_cuda.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda110.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda114.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda111_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda112.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda111.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda110_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda115_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda115.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/home/ubuntu/miniconda3/envs/py310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
/home/ubuntu/miniconda3/envs/py310/nsight-compute/2023.1.1/target/linux-desktop-glibc_2_19_0-ppc64le/libcuda-injection.so
/home/ubuntu/miniconda3/envs/py310/nsight-compute/2023.1.1/target/linux-desktop-glibc_2_11_3-x64/libcuda-injection.so
/home/ubuntu/miniconda3/envs/py310/nsight-compute/2023.1.1/target/linux-desktop-t210-a64/libcuda-injection.so
++++++++++++++++++ /usr/local CUDA PATHS +++++++++++++++++++
+++++++++++++++ WORKING DIRECTORY CUDA PATHS +++++++++++++++
++++++++++++++++++ LD_LIBRARY CUDA PATHS +++++++++++++++++++
+++++++++++ /usr/lib/x86_64-linux-gnu CUDA PATHS +++++++++++
/usr/lib/x86_64-linux-gnu/libcudart.so
/usr/lib/x86_64-linux-gnu/stubs/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
COMPILED_WITH_CUDA = True
COMPUTE_CAPABILITIES_PER_GPU = ['9.0']
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Running a quick check that:
+ library is importable
+ CUDA function is callable
WARNING: Please be sure to sanitize sensible info from any such env vars!
SUCCESS!
Installation was successful!
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
Misc
All related issues:
- lambdalabs h100 https://github.com/tloen/alpaca-lora/issues/174
- older gpu: https://github.com/oobabooga/text-generation-webui/issues/379
- says due to card issue: https://github.com/Facico/Chinese-Vicuna/issues/3
- 3070: https://github.com/Facico/Chinese-Vicuna/issues/41
- 3060: https://github.com/oobabooga/text-generation-webui/issues/569
- no resolution: https://github.com/Lightning-AI/lit-llama/issues/315
- low vram https://github.com/deep-diver/LLM-As-Chatbot/issues/16
- no resolution: https://github.com/huggingface/transformers/issues/21371
Also tried install cudatoolkit via conda.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 10
- Comments: 20 (2 by maintainers)
This error is related to H100, I tried loading the model on H100 and got the error, the same load8bit was tried on A100 and it’s working fine.
@basteran Did you find the fix? @TimDettmers Any updates?