pytorch_geometric: RuntimeError: scatter_add_cuda_kernel does not have a deterministic implementation

I am trying to use GCN and GAT from library and getting this error :

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch_scatter/scatter.py", line 21, in softmax
            size[dim] = int(index.max()) + 1
        out = torch.zeros(size, dtype=src.dtype, device=src.device)
        return out.scatter_add_(dim, index, src)
               ~~~~~~~~~~~~~~~~ <--- HERE
    else:
        return out.scatter_add_(dim, index, src)
RuntimeError: scatter_add_cuda_kernel does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.

I tried to add torch.use_deterministic_algorithms(True) but not working Same code is working well on CPU. How I can avoid this error?

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 25 (15 by maintainers)

Most upvoted comments

Sure. The difference between those two approaches is that, for scatter, the order of aggregation is not deterministic since internally scatter is implemented by making use of atomic operations. This may lead to slightly different outputs induced by floating point precision, e.g., 3 + 2 + 1 = 5.000001 while 1 + 2 + 3 = 4.9999999. In contrast, the order of aggregation in SparseTensor is always deterministic and is performed based on the ordering of node indices. In practice, either operation works fine, in particular because graphs do not usually obey a fixed size ordering. The final accuracy is usually the same.

rusty1s on Sep 20, 2021

scatter_* is not a deterministic operation, and using torch.use_deterministic_algorithms(True) will result in an error. If you want to make use of deterministic operations in PyG, you have to use the SparseTensor class as an alternative to edge_index, see here.

rusty1s on Sep 19, 2021

Hi, @rusty1s I try to make mutag_gin.py output a deterministic result. Following the above suggestions, I have changed the dataset to SparseTensor, edge_index to adj_t and changed line 53 to x = global_max_pool(x, batch), but I still got random result. I set the seed as follows:

seed = 2
torch.manual_seed(seed)  ##
# np.random.seed(seed)
# random.seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.use_deterministic_algorithms(True)

Any help would be appreciated. Thanks.

cxw-droid on Jul 16, 2022

Well, from my understanding GINEConv requires the edge_attr to have a dimensionality of in_channels. GATConv, on the other hand, requires edge_attr to have a dimensionality of heads * out_channels. I don’t see why one wouldn’t be able to create their own projections in order to ensure edge_attr has the correct size for each of these cases. I agree, however, that not enforcing edge_dim may be confusing to some (and the documentation is already explicit about this internal Linear layer), so I don’t really have a problem with it as long as add_self_loops works fine.

And yep, the final solution should be what you said in the second paragraph! However, as per your previous comment, a temporary workaround could be to just add the self loops as part of the Transform or directly to the original edge_index & edge_attr tensors before creating the SparseTensor. Thanks!

andrei-rusu on Mar 5, 2022

you have the assertion below, which fails if one does not set edge_dim but passes in edge attributes (self.lin_edge gets initialized only if edge_dim is not None).

I think this is definitely intended. If you pass in edge_attr it shouldn’t be ignored just because one didn’t set edge_dim. The GINEConv layer does not have this constraint since it requires same input dimensionality across node and edge features (which comes with other disadvantages).

I guess the easiest workaround for your problem would be the addition of self-loops in SparseTensor for multi-dimensional edge features as well, right? We would need to add support for this in add_self_loops.

rusty1s on Mar 5, 2022