pytorch_scatter: Simply importing torch_scatter causes CUDA error with PyTorch 1.9 and cuda 11.1

Hello,

Here is the minimal working example of my error:

import torch
z = torch.zeros(10, device=torch.device('cuda:0'))  # works
import torch_scatter
z = torch.zeros(10, device=torch.device('cuda:0'))  # Error
# RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
z = torch.zeros(5, device=torch.device('cpu')) # works

Here are my installation and system details:

OS: Ubuntu 20.04.1 LTS
GPU: Quadro GP100
torch version: 1.9.0
torch_scatter version: 2.0.7
CUDA version (as given by torch.version.cuda): 11.1

Other details:

I installed PyTorch via conda. The installation works fine and I can train models as per usual.
The same error persists for torch_scatter installed via conda or via the pip command pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cu111.html

Any pointers on how to fix this will be appreciated. Thanks!

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 22 (12 by maintainers)

Most upvoted comments

Downgrading CUDA from 11.1 to 11.0 and installing pytorch-scatter from source has solved the issue for me. I can now copy tensors to CUDA without the previous CUDA error.

However, this is not ideal and it is still worth fixing the current issue with CUDA 11.1, since PyTorch does not provide pre-built binaries for CUDA 11.0 in their latest releases (v1.8.0+). Downgrading to CUDA 11.0 means that I need to use PyTorch v1.7.1 for now.

yuhaozhang on Jul 15, 2021

Wheels have been updated 😃

rusty1s on Sep 9, 2021

Yes, will probably release new versions early September.

rusty1s on Aug 24, 2021

I have figured out the problem, and a possible solution (thanks to @ntselepidis for helping me out). The problem lies in a compatibility issue with newer Nvidia GPU architectures – in fact the default setup.py and CMakeLists.txt demands compute_capabilty 3.5, which is quite old and not compatible with newer architectures. Note also that the nvcc compiler warns you that -arch=sm_35 will soon not be supported anymore.

$nvcc --help # version 11.1 –gpu-architecture <arch> (-arch) Note: the values compute_30, compute_32, compute_35, compute_37, sm_30, sm_32, sm_35, sm_37 and sm_50 are deprecated and may be removed a future release.

In my case, I was trying to run the code on a GTX 1080 Ti (Pascal architecture, compute capability 6.1) and V100 (Volta architecture, compute capability 7.0).

Solving the problem just for your machine

If you are just interested in getting the code to run on your specific machine, it is sufficient to clone the github repository and remove the -arch=sm_35 flags from the setup.py and CMakeLists.txt. Note that you have to do this both for pytorch_scatter and pytorch_sparse. Then, when running python setup.py install for each of them, cuda will automatically detect your graphics card and compile for that architecture.

Compiling for different architectures

As far as I understand, the nvidia compiler nvcc is able to compile directly for certain architectures, as well as generate a virtual code (called PTX) that is able to be used for forwards-compatibility.

For example, the Nvidia Ampere compatibility guide suggests compiling like this:

/usr/local/cuda/bin/nvcc
  -gencode=arch=compute_52,code=sm_52
  -gencode=arch=compute_60,code=sm_60
  -gencode=arch=compute_61,code=sm_61
  -gencode=arch=compute_70,code=sm_70
  -gencode=arch=compute_75,code=sm_75
  -gencode=arch=compute_80,code=sm_80
  -gencode=arch=compute_80,code=compute_80
  -O2 -o mykernel.o -c mykernel.cu

These flags could be added to setup.py. This would generate explicit code for compute capabilities 5.2, 6.0, ..., 8.0 aswell as PTX code (forwards compatibility) from 8.0.

@rusty1s wrote

Thanks for letting me know. Can you try to run via:
export TORCH_CUDA_ARCH_LIST="3.5;5.0+PTX;6.0;7.0;7.5;8.0;8.6"
rm -rf build/ && python setup.py install

This is the same idea, and is parsed by the ./cmake/public/utils.cmake file in the official pytorch code. But I believe this is not parsed in this repository, so setting it will have no effect.

Possible PR

Remove -arch=sm_35 from setup.py and CMakeLists.txt both for pytorch_sparse and pytorch_scatter
When building binaries for pypy, compile with the flags listed above to support all (?) gpu architectures (possibly even add some older architectures to the compile flags).
(optional) include support for TORCH_CUDA_ARCH_LIST by using the utils.cmake script mentioned above.

@rusty1s Let me know if you have any thoughts on this - I am happy to set up a pull request that incorporates these ideas.

RomeoV on Aug 16, 2021

Thanks for letting me know. Can you try to run via:

export TORCH_CUDA_ARCH_LIST="3.5;5.0+PTX;6.0;7.0;7.5;8.0;8.6"
rm -rf build/ && python setup.py install

rusty1s on Jul 13, 2021