pytorch_scatter: Simply importing torch_scatter causes CUDA error with PyTorch 1.9 and cuda 11.1

Hello,

Here is the minimal working example of my error:

import torch
z = torch.zeros(10, device=torch.device('cuda:0'))  # works
import torch_scatter
z = torch.zeros(10, device=torch.device('cuda:0'))  # Error
# RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
z = torch.zeros(5, device=torch.device('cpu')) # works

Here are my installation and system details:

  • OS: Ubuntu 20.04.1 LTS
  • GPU: Quadro GP100
  • torch version: 1.9.0
  • torch_scatter version: 2.0.7
  • CUDA version (as given by torch.version.cuda): 11.1

Other details:

  • I installed PyTorch via conda. The installation works fine and I can train models as per usual.
  • The same error persists for torch_scatter installed via conda or via the pip command pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cu111.html

Any pointers on how to fix this will be appreciated. Thanks!

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 22 (12 by maintainers)

Most upvoted comments

Downgrading CUDA from 11.1 to 11.0 and installing pytorch-scatter from source has solved the issue for me. I can now copy tensors to CUDA without the previous CUDA error.

However, this is not ideal and it is still worth fixing the current issue with CUDA 11.1, since PyTorch does not provide pre-built binaries for CUDA 11.0 in their latest releases (v1.8.0+). Downgrading to CUDA 11.0 means that I need to use PyTorch v1.7.1 for now.

Wheels have been updated 😃

Yes, will probably release new versions early September.

I have figured out the problem, and a possible solution (thanks to @ntselepidis for helping me out). The problem lies in a compatibility issue with newer Nvidia GPU architectures – in fact the default setup.py and CMakeLists.txt demands compute_capabilty 3.5, which is quite old and not compatible with newer architectures. Note also that the nvcc compiler warns you that -arch=sm_35 will soon not be supported anymore.

$nvcc --help # version 11.1 –gpu-architecture <arch> (-arch) Note: the values compute_30, compute_32, compute_35, compute_37, sm_30, sm_32, sm_35, sm_37 and sm_50 are deprecated and may be removed a future release.

In my case, I was trying to run the code on a GTX 1080 Ti (Pascal architecture, compute capability 6.1) and V100 (Volta architecture, compute capability 7.0).

Solving the problem just for your machine

If you are just interested in getting the code to run on your specific machine, it is sufficient to clone the github repository and remove the -arch=sm_35 flags from the setup.py and CMakeLists.txt. Note that you have to do this both for pytorch_scatter and pytorch_sparse. Then, when running python setup.py install for each of them, cuda will automatically detect your graphics card and compile for that architecture.

Compiling for different architectures

As far as I understand, the nvidia compiler nvcc is able to compile directly for certain architectures, as well as generate a virtual code (called PTX) that is able to be used for forwards-compatibility.

For example, the Nvidia Ampere compatibility guide suggests compiling like this:

/usr/local/cuda/bin/nvcc
  -gencode=arch=compute_52,code=sm_52
  -gencode=arch=compute_60,code=sm_60
  -gencode=arch=compute_61,code=sm_61
  -gencode=arch=compute_70,code=sm_70
  -gencode=arch=compute_75,code=sm_75
  -gencode=arch=compute_80,code=sm_80
  -gencode=arch=compute_80,code=compute_80
  -O2 -o mykernel.o -c mykernel.cu

These flags could be added to setup.py. This would generate explicit code for compute capabilities 5.2, 6.0, ..., 8.0 aswell as PTX code (forwards compatibility) from 8.0.

@rusty1s wrote

Thanks for letting me know. Can you try to run via:

export TORCH_CUDA_ARCH_LIST="3.5;5.0+PTX;6.0;7.0;7.5;8.0;8.6"
rm -rf build/ && python setup.py install

This is the same idea, and is parsed by the ./cmake/public/utils.cmake file in the official pytorch code. But I believe this is not parsed in this repository, so setting it will have no effect.

Possible PR

  • Remove -arch=sm_35 from setup.py and CMakeLists.txt both for pytorch_sparse and pytorch_scatter
  • When building binaries for pypy, compile with the flags listed above to support all (?) gpu architectures (possibly even add some older architectures to the compile flags).
  • (optional) include support for TORCH_CUDA_ARCH_LIST by using the utils.cmake script mentioned above.

@rusty1s Let me know if you have any thoughts on this - I am happy to set up a pull request that incorporates these ideas.

Thanks for letting me know. Can you try to run via:

export TORCH_CUDA_ARCH_LIST="3.5;5.0+PTX;6.0;7.0;7.5;8.0;8.6"
rm -rf build/ && python setup.py install