axolotl: Error running in Lambda Labs VM using instructions in docs
After upgrading to Python 3.9 and setting everything up, PyTorch becomes unusable, so the training script fails:
ubuntu@209-20-159-38:~/axolotl$ python
Python 3.9.17 (main, Jun 6 2023, 20:11:04)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/torch/__init__.py", line 443, in <module>
raise ImportError(textwrap.dedent('''
ImportError: Failed to load PyTorch C extensions:
It appears that PyTorch has loaded the `torch/_C` folder
of the PyTorch repository rather than the C extensions which
are expected in the `torch._C` namespace. This can occur when
using the `install` workflow. e.g.
$ python setup.py install && python -c "import torch"
This error can generally be solved using the `develop` workflow
$ python setup.py develop && python -c "import torch" # This should succeed
or by running Python from a different directory.
I condensed everything into one script, if we can get it working we could add it inside scripts/
for ease of access:
#!/bin/bash
set -e
# Function to gracefully exit if a command fails
abort() {
echo >&2 '
***************
*** ABORTED ***
***************
'
echo "An error occurred. Exiting..." >&2
exit 1
}
trap 'abort' 0
# Update system
sudo apt update
# Install python3.9
sudo apt install -y python3.9
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.9 1
sudo update-alternatives --config python # user must pick 3.9 if given option
# Verify python version
version=$(python -V 2>&1 | grep -Po '(?<=Python )(.+)')
if [[ -z "$version" ]]
then
echo "Failed to detect python version"
exit 1
fi
echo "Python version $version installed."
# Install pip
wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py
rm get-pip.py
# Install torch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Ensure setuptools installed
pip3 install -U setuptools
# Install Axolotl
pip3 install -e .
# Install Axolotl dependencies
pip3 install protobuf==3.20.3
pip3 install -U requests scipy
pip3 install --ignore-installed psutil
pip3 install git+https://github.com/huggingface/peft.git # not for gptq
# Set path
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
# If we've reached this point, all commands were successful
trap : 0
echo >&2 '
************
*** DONE ***
************
'
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 24 (1 by maintainers)
I use an older version of transformers 4.29.2. However, you’ll need to modify source code to comment out 4bit.
You can pass a
-U
to upgrade or a--ignore-installed
so it force installs. I was planning to add that flag but didn’t get to.Hey, seems like somehow the permissions got stuck? I would recommend just rebooting a new one to start from scratch. I would recommend Miniconda then to prevent this issue. I don’t think you need to use sudo at all.
Lastly, I just wanted to give fyi, I also have issues with H100 on lambdalabs particularly with bitsandbytes and xformers in case you’re using those!
Hey, is this lambdalabs?
The error sounds like you failed to install torch correctly. Maybe try to uninstall then reinstall. You can then try Miniconda next as second option. If all fails, you can use docker image.