pytorch_scatter: Simply importing torch_scatter causes CUDA error with PyTorch 1.9 and cuda 11.1
Hello,
Here is the minimal working example of my error:
import torch
z = torch.zeros(10, device=torch.device('cuda:0')) # works
import torch_scatter
z = torch.zeros(10, device=torch.device('cuda:0')) # Error
# RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
z = torch.zeros(5, device=torch.device('cpu')) # works
Here are my installation and system details:
- OS: Ubuntu 20.04.1 LTS
- GPU: Quadro GP100
- torch version: 1.9.0
- torch_scatter version: 2.0.7
- CUDA version (as given by
torch.version.cuda): 11.1
Other details:
- I installed PyTorch via conda. The installation works fine and I can train models as per usual.
- The same error persists for torch_scatter installed via conda or via the pip command
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cu111.html
Any pointers on how to fix this will be appreciated. Thanks!
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 22 (12 by maintainers)
Downgrading CUDA from 11.1 to 11.0 and installing
pytorch-scatterfrom source has solved the issue for me. I can now copy tensors to CUDA without the previous CUDA error.However, this is not ideal and it is still worth fixing the current issue with CUDA 11.1, since PyTorch does not provide pre-built binaries for CUDA 11.0 in their latest releases (v1.8.0+). Downgrading to CUDA 11.0 means that I need to use PyTorch v1.7.1 for now.
Wheels have been updated 😃
Yes, will probably release new versions early September.
I have figured out the problem, and a possible solution (thanks to @ntselepidis for helping me out). The problem lies in a compatibility issue with newer Nvidia GPU architectures – in fact the default
setup.pyandCMakeLists.txtdemands compute_capabilty3.5, which is quite old and not compatible with newer architectures. Note also that thenvcccompiler warns you that-arch=sm_35will soon not be supported anymore.In my case, I was trying to run the code on a
GTX 1080 Ti(Pascal architecture, compute capability 6.1) andV100(Volta architecture, compute capability 7.0).Solving the problem just for your machine
If you are just interested in getting the code to run on your specific machine, it is sufficient to clone the github repository and remove the
-arch=sm_35flags from thesetup.pyandCMakeLists.txt. Note that you have to do this both forpytorch_scatterandpytorch_sparse. Then, when runningpython setup.py installfor each of them, cuda will automatically detect your graphics card and compile for that architecture.Compiling for different architectures
As far as I understand, the nvidia compiler
nvccis able to compile directly for certain architectures, as well as generate avirtualcode (called PTX) that is able to be used for forwards-compatibility.For example, the Nvidia Ampere compatibility guide suggests compiling like this:
These flags could be added to
setup.py. This would generate explicit code for compute capabilities5.2, 6.0, ..., 8.0aswell as PTX code (forwards compatibility) from 8.0.@rusty1s wrote
This is the same idea, and is parsed by the
./cmake/public/utils.cmakefile in the official pytorch code. But I believe this is not parsed in this repository, so setting it will have no effect.Possible PR
-arch=sm_35fromsetup.pyandCMakeLists.txtboth forpytorch_sparseandpytorch_scatterTORCH_CUDA_ARCH_LISTby using theutils.cmakescript mentioned above.@rusty1s Let me know if you have any thoughts on this - I am happy to set up a pull request that incorporates these ideas.
Thanks for letting me know. Can you try to run via: