spaCy: IndexError: list index out of range while training parser
Training pipeline: ['parser']
Starting with blank model 'ko'
Counting training words (limit=0)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ksjae/.local/lib/python3.7/site-packages/spacy/__main__.py", line
35, in <module>
plac.call(commands[command], sys.argv[1:])
File "/home/ksjae/.local/lib/python3.7/site-packages/plac_core.py", line 328,
in call
cmd, result = parser.consume(arglist)
File "/home/ksjae/.local/lib/python3.7/site-packages/plac_core.py", line 207,
in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/home/ksjae/.local/lib/python3.7/site-packages/spacy/cli/train.py", line
213, in train
optimizer = nlp.begin_training(lambda: corpus.train_tuples, device=use_gpu)
File "/home/ksjae/.local/lib/python3.7/site-packages/spacy/language.py", line 583, in begin_training
**kwargs
File "nn_parser.pyx", line 576, in spacy.syntax.nn_parser.Parser.begin_training
File "arc_eager.pyx", line 346, in spacy.syntax.arc_eager.ArcEager.get_actions
File "nonproj.pyx", line 123, in spacy.syntax.nonproj.projectivize
File "nonproj.pyx", line 172, in spacy.syntax.nonproj._get_smallest_nonproj_arc
File "nonproj.pyx", line 58, in spacy.syntax.nonproj.is_nonproj_arc
File "nonproj.pyx", line 26, in ancestors
IndexError: list index out of range
How to reproduce the behaviour
Training code as-is from document
python3 -m spacy train ko model KNLI-spacy.json KNLI-spacy-dev.json -p parser
Use this json file EDIT: These are faulty but remained in place, use Corpus.zip for newest ones https://1drv.ms/u/s!Aq0-1ykl7mZBqWCbqo6cq4X1amma?e=1eBExo for KNLI-spacy.json https://1drv.ms/u/s!Aq0-1ykl7mZBqWGLptXC0Ba5nGFK?e=suX2RJ for KNLI-spacy-dev.json
Your Environment
- spaCy version: 2.1.9
- Platform: Linux-4.4.0-178-generic-x86_64-with-debian-stretch-sid
- Python version: 3.7.7
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 20 (2 by maintainers)
Hi, the problem is that the heads haven’t been converted correctly for spacy’s training format. The heads should be relative to the current token, not absolute IDs. The root should have head
0and all other tokens should have heads relative to their position, so a head of-2would mean the head is two words to the left,1would mean one word to the right, etc.The data loader should fail with a more useful error in this case, though. I’ll take a look to see how this could be improved.