spaCy: Noun phrase merge is failing

This is now failing:

>>> doc = nlp('The cat sat on the mat')
>>> for np in doc.noun_chunks:
        np.merge(np.root.tag_, np.text, np.root.ent_type_)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-409-f6294d1a1cf8> in <module>()
      1 doc = nlp('The cat sat on the mat')
----> 2 for np in doc.noun_chunks:
      3     np.merge(np.root.tag_, np.text, np.root.ent_type_)

/Users/yaser/miniconda3/envs/spacy/lib/python3.5/site-packages/spacy/tokens/doc.pyx in noun_chunks (spacy/tokens/doc.cpp:7745)()

/Users/yaser/miniconda3/envs/spacy/python3.5/site-packages/spacy/syntax/iterators.pyx in english_noun_chunks (spacy/syntax/iterators.cpp:1559)()

/Users/yaser/miniconda3/envs/spacy/lib/python3.5/site-packages/spacy/tokens/doc.pyx in spacy.tokens.doc.Doc.__getitem__ (spacy/tokens/doc.cpp:4853)()

IndexError: list index out of range

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 1
Comments: 15 (6 by maintainers)

Commits related to this issue

* Fix Issue #375: noun phrase iteration results in index error if noun phrases are merged during the loop. Fix by accumulating the spans inside the noun_chunks property, allowing the Span index tricks... — committed to explosion/spaCy by honnibal 8 years ago

Most upvoted comments

Ah, this was dumb, sorry — I didn’t have time to really look at this, now that I have it’s obvious there’s a problem. Actually I’m not sure how the code was working before. I think there was always a bug here.

Please work around this for now by doing for np in list(doc.noun_chunks). The problem is that we’re changing the tokenisation out from underneath the iterator we’re yielding from, and this is causing problems.

I think this is always going to be hard to get right, and I’m going to change the noun chunks code to accumulate the spans before it yields them.

honnibal on May 20, 2016

This issue should not have been closed because it is still present in Spacy 2.0 alpha. Merging tokens (compounds, entities, matches, etc.) often results in this IndexError: Error calculating span: Can't find start.

anna-hope on Sep 7, 2017

This is still failing for me on 0.101.0. The workaround with for np in list(doc.noun_chunks) works.

elyase on Sep 5, 2016