pdm: Cannot install PyTorch 1.13.x with PDM
- I have searched the issue tracker and believe that this is not a duplicate.
Make sure you run commands with -v flag before pasting the output.
Steps to reproduce
- Install PyTorch 1.13.x by running
pdm add torch(1.13.1 is the latest version currently.) - Try to import pytorch in the interpreter
python -c 'import torch'.
Actual behavior
PyTorch should be imported without any errors.
Expected behavior
❯ python -c 'import torch'
Traceback (most recent call last):
File ".../.venv/lib/python3.10/site-packages/torch/__init__.py", line 172, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcublas.so.11: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File ".../.venv/lib/python3.10/site-packages/torch/__init__.py", line 217, in <module>
_load_global_deps()
File ".../.venv/lib/python3.10/site-packages/torch/__init__.py", line 178, in _load_global_deps
_preload_cuda_deps()
File ".../.venv/lib/python3.10/site-packages/torch/__init__.py", line 158, in _preload_cuda_deps
ctypes.CDLL(cublas_path)
File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: .../.venv/lib/python3.10/site-packages/nvidia/cublas/lib/libcublas.so.11: cannot open shared object file: No such file or directory
Environment Information
PDM version:
2.4.6
Python Interpreter:
.../.venv/bin/python (3.10)
Project Root:
...
Project Packages:
None
{
"implementation_name": "cpython",
"implementation_version": "3.10.10",
"os_name": "posix",
"platform_machine": "x86_64",
"platform_release": "5.4.0-121-generic",
"platform_system": "Linux",
"platform_version": "#137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022",
"python_full_version": "3.10.10",
"platform_python_implementation": "CPython",
"python_version": "3.10",
"sys_platform": "linux"
}
I “think” this is related to the fact that PyTorch 1.13.x introduced a new set of dependencies around cuda (https://github.com/pytorch/pytorch/pull/85097). Poetry had issues b/c of this (https://github.com/pytorch/pytorch/issues/88049) but it’s since been resolved, but not for pdm. My guess is that it might be b/c pdm installs the cuda dependencies separately from pytorch and b/c of that the pytorch installation doesn’t know about them. It’s a bummer, b/c I wanted to give pdm a spin for a new project, for now I’m going to have to stick to poetry. 😕
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 19
For anyone coming here off search engines… I wiped my lock file and .venv, and the following worked for me (thanks to #2425!):
It doesn’t work with latest pdm or pytorch If there is actually a problem with nvidia, pytorch users will be happier if there is some way to compromise
I’m compelled to create a script that runs like this and copy it directly to the cache.
It works explicitly, but the user should not ask for it.
For example, is it possible to do a workaround that downloads only libraries from nvidia (explicitly named libraries like pdm.toml) directly instead of a symlink(cache_method)?
By the way, in my environment,
pdm config install.cache_method pthdid not work.I’m having a similar (probably even the same problem) and I suspect the
install.cachesetting being the culprit here (I assume @yukw777 also has this set totrue).I discovered the following issue with the nvidia libraries (nvidia_cublas_cu11, nvidia_cuda_nvrtc_cu11, etc.):
With
install.cacheturned off, the directory structure is as follows:As soon as you activate
install.cache, the directory structure changes:The content of
/root/.cache/pdm/packages/nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64/lib/nvidiais obviously onlyI hope that this issue can be fixed somehow (I don’t know how standard compliant several packages installing into a common package folder is) because the nvidia packages are the primary reaon I activated
install.cachein the first place.The main cause is
nvidiais a normal package with a blank__init__.py, in which case PDM will create a single symlink for the whole directory. Maybe we can implement a different link strategy to force PDM to create a symlink for each individual files.I tested your suggestion but the problem still persists.
Looking at the PyTorch source code (https://github.com/pytorch/pytorch/blob/v1.13.1/torch/__init__.py#L144) reveals the underlying problem:
nvidiafolder in all elements of the sys.pathSo the problem here is, I think,
install.cache_method pthalso not a possibility. If they would resolve the path to the libraries for each package while iterating sys.path, it could work…From looking at the code, PyTorch 2.0.0 might actually work with PDM and
install.cache_method pthas the code that loads the cuda libraries iterates all elements ofsys.pathand looks for the nvidia subfolder and the library in each element individually.