unsloth: ImportError: Unsloth: CUDA is not linked properly.

I followed the conda installation instructions in the README:

conda create --name unsloth_env python=3.10
conda activate unsloth_env

conda install pytorch cudatoolkit torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

conda install xformers -c xformers

pip install bitsandbytes

pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

There were no issues during the install. However, when I try to import from unsloth, I get an error.

from unsloth import FastLanguageModel

Results in an error:

/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/__init__.py:71: UserWarning: Unsloth: Running `ldconfig /usr/lib64-nvidia` to link CUDA.
  warnings.warn(
/sbin/ldconfig.real: Can't create temporary cache file /etc/ld.so.cache~: Permission denied
Traceback (most recent call last):
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/__init__.py", line 68, in <module>
    cdequantize_blockwise_fp32 = bnb.functional.lib.cdequantize_blockwise_fp32
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
    func = self.__getitem__(name)
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cdequantize_blockwise_fp32

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/__init__.py", line 99, in <module>
    cdequantize_blockwise_fp32 = bnb.functional.lib.cdequantize_blockwise_fp32
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
    func = self.__getitem__(name)
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cdequantize_blockwise_fp32

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/__init__.py", line 102, in <module>
    raise ImportError("Unsloth: CUDA is not linked properly.\n"\
ImportError: Unsloth: CUDA is not linked properly.
We tried running `ldconfig /usr/lib64-nvidia` ourselves, but it didn't work.
You need to run in your terminal `sudo ldconfig /usr/lib64-nvidia` yourself, then import Unsloth.
Also try `sudo ldconfig /usr/local/cuda-xx.x` - find the latest cuda version.

I looked for both /usr/lib64-nvidia and files matching /usr/local/cuda-* but found none.

I have the NVIDIA driver installed fine for cuda=12.2. Here is the output of my nvidia-smi command:

Tue Mar  5 01:23:41 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1B.0 Off |                    0 |
| N/A   27C    P0              24W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla T4                       Off | 00000000:00:1C.0 Off |                    0 |
| N/A   27C    P0              25W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla T4                       Off | 00000000:00:1D.0 Off |                    0 |
| N/A   27C    P0              24W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   28C    P0              25W /  70W |      2MiB / 15360MiB |      6%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

I’m running Ubuntu 22.04.3 on an AWS EC2 g4dn.12xlarge which has T4 GPUs.

About this issue

Original URL
State: open
Created 4 months ago
Comments: 25 (12 by maintainers)

Most upvoted comments

@danielhanchen The following worked as you suggested:

conda create --name unsloth_env python=3.10
conda activate unsloth_env

conda install pytorch==2.2.0 cudatoolkit torchvision torchaudio pytorch-cuda=<12.1/11.8> -c pytorch -c nvidia

conda install xformers -c xformers

pip install bitsandbytes

pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

Or at least I can do the import without any error now:

from unsloth import FastLanguageModel

FYI the import is quite slow (10-15 seconds). I’m not sure if that is standard.

Thanks for your help.

athoag-sony on Mar 15, 2024