GPTQ-for-LLaMa: Multiple errors while compiling the kernel
Hello, while trying to run python setup_cuda.py install
, I get this error:
(venv) C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa>python setup_cuda.py install
running install
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info\PKG-INFO
writing dependency_links to quant_cuda.egg-info\dependency_links.txt
writing top-level names to quant_cuda.egg-info\top_level.txt
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quant_cuda.egg-info\SOURCES.txt'
writing manifest file 'quant_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\utils\cpp_extension.py:387: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'quant_cuda' extension
"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\TH -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\include "-IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\include" "-IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\Include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" /EHsc /Tpquant_cuda.cpp /Fobuild\temp.win-amd64-cpython-310\Release\quant_cuda.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0
quant_cuda.cpp
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
Then after a long list of errors, I get this at the end:
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc" -c quant_cuda_kernel.cu -o build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\TH -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\include "-IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\include" "-IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\Include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --use-local-env
quant_cuda_kernel.cu
C:/Users/Username/Documents/GitHub/GPTQ-for-LLaMa/venv/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\pybind11\cast.h(624): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(721): here
C:\Users\Username\Documents\GitHub\GPTQ-for-LLaMa\venv\lib\site-packages\torch\include\pybind11\cast.h(717): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(721): here
2 errors detected in the compilation of "quant_cuda_kernel.cu".
error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.4\\bin\\nvcc.exe' failed with exit code 1
Any idea what could be causing this? I’ve tried installing CUDA Toolkit 11.3 and Torch 1.12.1, but they too give the same error.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 34 (1 by maintainers)
Thank you! this actually worked, now loading the 13B at around 9GB vram. I noticed tho that the speed in linux is ridiculously faster than windows, even 4bit 13B on windows is like half the speed of normal run of 13B on linux… 😮
@lxe
This repo only contains a readme.md:
Should we use the one mentioned in the readme.md, which is also from March 2019? I doubt. If not, which transformers repo should we install? The live one, or the one with the llama push via
git clone --branch llama_push https://github.com/zphang/transformers.git
?EDIT This comment is linked from elsewhere. Here’s a more coherent guide: https://gist.github.com/lxe/82eb87db25fdb75b92fa18a6d494ee3c
I had to downgrade cuda and torch and was able to compile. Here’s my full process on windows:
powershell -ExecutionPolicy ByPass -NoExit -Command "& 'C:\Users\lxe\miniconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\lxe\miniconda3' "
conda create -n gptq
conda activate gptq
conda install cuda -c nvidia/label/cuda-11.3.0 -c nvidia/label/cuda-11.3.1
conda install pip
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
git clone https://github.com/zphang/transformers.git
pip install ./transformers
pip install torch==1.12+cu113 -f https://download.pytorch.org/whl/torch_stable.html
cd GPTQ-for-LLaMa
$env:DISTUTILS_USE_SDK=1
python setup_cuda.py install
When using the webui, make sure it’s in the same env. If it overwrites torch, you’ll have to do it again manually.