gcn: Can't run on Nell dataset

You reported the performance of GCN on Nell. I notice that you used data provided by Yang. I download Nell from Yang’s GitHub https://github.com/kimiyoung/planetoid. But when I run your program on Nell, it runs into a runtime error:

"utils.py", line 51, in load_data 
    features[test_idx_reorder, :] = features[test_idx_range, :]
ValueError: row index 9897 out of bounds

It seems that it is reordering the test data points, in order to keep consistent with adjacency matrix, but some indices are out of bounds.

The full stacktrace:

$ python train.py --dataset nell.0.01
Traceback (most recent call last):
  File "train.py", line 29, in <module>
    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)
  File "/Users/liqimai/anaconda3/lib/python3.5/site-packages/gcn-1.0-py3.5.egg/gcn/utils.py", line 51, in load_data
    features[test_idx_reorder, :] = features[test_idx_range, :]
  File "/Users/liqimai/anaconda3/lib/python3.5/site-packages/scipy/sparse/lil.py", line 289, in __getitem__
    return self._get_row_ranges(i, j)
  File "/Users/liqimai/anaconda3/lib/python3.5/site-packages/scipy/sparse/lil.py", line 329, in _get_row_ranges
    j_start, j_stop, j_stride, nj)
  File "scipy/sparse/_csparsetools.pyx", line 787, in scipy.sparse._csparsetools.lil_get_row_ranges (scipy/sparse/_csparsetools.c:11978)
ValueError: row index 9897 out of bounds

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Comments: 39 (11 by maintainers)

Most upvoted comments

I forked your code and modified a lot. My modified version only has 45% accuracy. When I used your original program with above code snippet, it turned out about 60% accuracy, very close to the accuracy under random split you reported in the paper. I think there must be something wrong in my modified version. Thanks a lot!

Put another code snippet here which I think you used in preprocessing nell, if anyone need it:

def save_sparse_csr(filename,array):
    np.savez(filename,data = array.data ,indices=array.indices,
             indptr =array.indptr, shape=array.shape )

def load_sparse_csr(filename):
    loader = np.load(filename)
    return csr_matrix((  loader['data'], loader['indices'], loader['indptr']),
                         shape = loader['shape'])

Thanks for testing. I’ll have a look at it as soon as I find time for it.

For now, I would recommend having a look at a better-suited model for relational datasets like this. We recently had a paper on this: https://arxiv.org/abs/1703.06103. This should give you better and more consistent results for directed graphs with different relation types. The NELL dataset as you’re using it now is preprocessed to be an undirected graph without edge types, so that the GCN model can be trained on it. You can find the original NELL dataset here: http://rtw.ml.cmu.edu/rtw/resources