spaCy: Noun phrase merge is failing
This is now failing:
>>> doc = nlp('The cat sat on the mat')
>>> for np in doc.noun_chunks:
np.merge(np.root.tag_, np.text, np.root.ent_type_)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-409-f6294d1a1cf8> in <module>()
1 doc = nlp('The cat sat on the mat')
----> 2 for np in doc.noun_chunks:
3 np.merge(np.root.tag_, np.text, np.root.ent_type_)
/Users/yaser/miniconda3/envs/spacy/lib/python3.5/site-packages/spacy/tokens/doc.pyx in noun_chunks (spacy/tokens/doc.cpp:7745)()
/Users/yaser/miniconda3/envs/spacy/python3.5/site-packages/spacy/syntax/iterators.pyx in english_noun_chunks (spacy/syntax/iterators.cpp:1559)()
/Users/yaser/miniconda3/envs/spacy/lib/python3.5/site-packages/spacy/tokens/doc.pyx in spacy.tokens.doc.Doc.__getitem__ (spacy/tokens/doc.cpp:4853)()
IndexError: list index out of range
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 1
- Comments: 15 (6 by maintainers)
Ah, this was dumb, sorry — I didn’t have time to really look at this, now that I have it’s obvious there’s a problem. Actually I’m not sure how the code was working before. I think there was always a bug here.
Please work around this for now by doing
for np in list(doc.noun_chunks). The problem is that we’re changing the tokenisation out from underneath the iterator we’re yielding from, and this is causing problems.I think this is always going to be hard to get right, and I’m going to change the noun chunks code to accumulate the spans before it yields them.
This issue should not have been closed because it is still present in Spacy 2.0 alpha. Merging tokens (compounds, entities, matches, etc.) often results in this
IndexError: Error calculating span: Can't find start.This is still failing for me on 0.101.0. The workaround with
for np in list(doc.noun_chunks)works.