pytorch_geometric: RuntimeError: scatter_add_cuda_kernel does not have a deterministic implementation
I am trying to use GCN and GAT from library and getting this error :
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch_scatter/scatter.py", line 21, in softmax
size[dim] = int(index.max()) + 1
out = torch.zeros(size, dtype=src.dtype, device=src.device)
return out.scatter_add_(dim, index, src)
~~~~~~~~~~~~~~~~ <--- HERE
else:
return out.scatter_add_(dim, index, src)
RuntimeError: scatter_add_cuda_kernel does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.
I tried to add torch.use_deterministic_algorithms(True) but not working
Same code is working well on CPU.
How I can avoid this error?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 25 (15 by maintainers)
Sure. The difference between those two approaches is that, for
scatter, the order of aggregation is not deterministic since internallyscatteris implemented by making use of atomic operations. This may lead to slightly different outputs induced by floating point precision, e.g.,3 + 2 + 1 = 5.000001while1 + 2 + 3 = 4.9999999. In contrast, the order of aggregation inSparseTensoris always deterministic and is performed based on the ordering of node indices. In practice, either operation works fine, in particular because graphs do not usually obey a fixed size ordering. The final accuracy is usually the same.scatter_*is not a deterministic operation, and usingtorch.use_deterministic_algorithms(True)will result in an error. If you want to make use of deterministic operations in PyG, you have to use theSparseTensorclass as an alternative toedge_index, see here.Hi, @rusty1s I try to make mutag_gin.py output a deterministic result. Following the above suggestions, I have changed the dataset to SparseTensor,
edge_indextoadj_tand changed line 53 tox = global_max_pool(x, batch), but I still got random result. I set the seed as follows:Any help would be appreciated. Thanks.
Well, from my understanding
GINEConvrequires theedge_attrto have a dimensionality ofin_channels.GATConv, on the other hand, requiresedge_attrto have a dimensionality ofheads * out_channels. I don’t see why one wouldn’t be able to create their own projections in order to ensureedge_attrhas the correct size for each of these cases. I agree, however, that not enforcingedge_dimmay be confusing to some (and the documentation is already explicit about this internal Linear layer), so I don’t really have a problem with it as long asadd_self_loopsworks fine.And yep, the final solution should be what you said in the second paragraph! However, as per your previous comment, a temporary workaround could be to just add the self loops as part of the Transform or directly to the original
edge_index&edge_attrtensors before creating the SparseTensor. Thanks!I think this is definitely intended. If you pass in
edge_attrit shouldn’t be ignored just because one didn’t setedge_dim. TheGINEConvlayer does not have this constraint since it requires same input dimensionality across node and edge features (which comes with other disadvantages).I guess the easiest workaround for your problem would be the addition of self-loops in
SparseTensorfor multi-dimensional edge features as well, right? We would need to add support for this inadd_self_loops.