pytorch_sparse: RuntimeError: CUDA error: an illegal memory access was encountered
File "examples/sem_seg_sparse/train.py", line 142, in <module>
main()
File "examples/sem_seg_sparse/train.py", line 61, in main
train(model, train_loader, optimizer, scheduler, criterion, opt)
File "examples/sem_seg_sparse/train.py", line 79, in train
out = model(data)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/deep_gcns_torch/examples/sem_seg_sparse/architecture.py", line 69, in forward
feats.append(self.gunet(feats[-1],edge_index=edge_index ,batch=batch))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch_geometric/nn/models/graph_unet.py", line 83, in forward
x.size(0))
File "/usr/local/lib/python3.6/dist-packages/torch_geometric/nn/models/graph_unet.py", line 120, in augment_adj
num_nodes)
File "/usr/local/lib/python3.6/dist-packages/torch_sparse/spspmm.py", line 30, in spspmm
C = matmul(A, B)
File "/usr/local/lib/python3.6/dist-packages/torch_sparse/matmul.py", line 107, in matmul
return spspmm(src, other, reduce)
File "/usr/local/lib/python3.6/dist-packages/torch_sparse/matmul.py", line 95, in spspmm
return spspmm_sum(src, other)
File "/usr/local/lib/python3.6/dist-packages/torch_sparse/matmul.py", line 83, in spspmm_sum
rowptrA, colA, valueA, rowptrB, colB, valueB, K)
RuntimeError: CUDA error: an illegal memory access was encountered (launch_kernel at /pytorch/aten/src/ATen/native/cuda/Loops.cuh:103)
hi, i’m intergrating the GraphU-Net and other model on the google colab, but there are some bug , could you help me ? thanks.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 20 (8 by maintainers)
Commits related to this issue
- `SentenceTransformer` encoder (#50) * add sentence transformer * optional dep * typo * add docstring and graceful failure — committed to RexYing/pytorch_sparse by rusty1s 3 years ago
The error seems to stem from the fact cuSPARSE cannot handle duplicated edges in
edge_index. The reason for that is that it fails to compute the correct amount of output edges this way. In your case, it might well be that you have some initial self-loop edges in your graph, which should be removed before callingadd_self_loops. I think your fix foraugment_adjis correct, and I added it to theGraphUNetmodel in PyG.@vthost @rusty1s Hi, I also met this error when use my own dataset to train
Graph-UNet. This error randomly occurred when using GPU but never occurred when using CPU. I changed theaugment_adjfunction, added theremove_self_loopsfunction at first, and then the problem was solved. But I don’t know why.I don’t think that’s related to the above issue. You may have a memory leak somewhere, or one of your graphs in your dataset is too large that it can not be handled in a full-batch fashion.
I now have this also with ASAPool 😦
I am using
ogbg-code. The example code for that data adds two types of edges to the graph inutils.augment_edge, so we might have several edges between two nodes. I tried to addcoalesced=Trueingraph_unet.augment_adjas argument tospspmmbut the error is still the same. It seems thatspspmminterprets thecoalescedargument as “sorted”. After I added the following in the beginning ofgraph_unet.forward(after the initialization of the edge weights), it runs for 74/143 epochs, and then the error comes again. If I add it ingraph_unet.augment_adj, the training runs through, but I get the same error in the evaluation inremove_self_loopsbecause the mask does not fitedge_attr[mask]. Just as an update…edge_index, edge_weight = coalesce(edge_index, edge_weight, x.shape[0], x.shape[0])Can you show me an example code? For example, my GraphU-Net script runs just fine. Note that you need to pass
coalesced=Trueif youredge_indexis not sorted.