gcn: Can't run on Nell dataset
You reported the performance of GCN on Nell. I notice that you used data provided by Yang. I download Nell from Yang’s GitHub https://github.com/kimiyoung/planetoid. But when I run your program on Nell, it runs into a runtime error:
"utils.py", line 51, in load_data
features[test_idx_reorder, :] = features[test_idx_range, :]
ValueError: row index 9897 out of bounds
It seems that it is reordering the test data points, in order to keep consistent with adjacency matrix, but some indices are out of bounds.
The full stacktrace:
$ python train.py --dataset nell.0.01
Traceback (most recent call last):
File "train.py", line 29, in <module>
adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)
File "/Users/liqimai/anaconda3/lib/python3.5/site-packages/gcn-1.0-py3.5.egg/gcn/utils.py", line 51, in load_data
features[test_idx_reorder, :] = features[test_idx_range, :]
File "/Users/liqimai/anaconda3/lib/python3.5/site-packages/scipy/sparse/lil.py", line 289, in __getitem__
return self._get_row_ranges(i, j)
File "/Users/liqimai/anaconda3/lib/python3.5/site-packages/scipy/sparse/lil.py", line 329, in _get_row_ranges
j_start, j_stop, j_stride, nj)
File "scipy/sparse/_csparsetools.pyx", line 787, in scipy.sparse._csparsetools.lil_get_row_ranges (scipy/sparse/_csparsetools.c:11978)
ValueError: row index 9897 out of bounds
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 39 (11 by maintainers)
I forked your code and modified a lot. My modified version only has 45% accuracy. When I used your original program with above code snippet, it turned out about 60% accuracy, very close to the accuracy under random split you reported in the paper. I think there must be something wrong in my modified version. Thanks a lot!
Put another code snippet here which I think you used in preprocessing nell, if anyone need it:
Thanks for testing. I’ll have a look at it as soon as I find time for it.
For now, I would recommend having a look at a better-suited model for relational datasets like this. We recently had a paper on this: https://arxiv.org/abs/1703.06103. This should give you better and more consistent results for directed graphs with different relation types. The NELL dataset as you’re using it now is preprocessed to be an undirected graph without edge types, so that the GCN model can be trained on it. You can find the original NELL dataset here: http://rtw.ml.cmu.edu/rtw/resources