spaCy: pipe(): ValueError Error parsing doc

I found strange behaviour using the pipe() method (only verified on german variant):

If you parse a document using pipe() you can get a ValueError, while if i use nlp(text) everything is fine. I boiled it down to single words, while german words work, english words like ‘windows’ don’t work.

Steps to reproduce:

import spacy
nlp = spacy.load('de')
def texts():
    yield "Windows"
for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
    print(len(doc))  # doc access -> ValueError

Trace

ValueError                                Traceback (most recent call last)
<ipython-input-2-9a095ec5505b> in <module>()
      8 def texts():
      9     yield "Windows"
---> 10 for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
     11     print(len(doc))

.../venv/lib/python3.4/site-packages/spacy/language.py in pipe(self, texts, tag, parse, entity, n_threads, batch_size)
    254             stream = self.entity.pipe(stream,
    255                 n_threads=1, batch_size=batch_size)
--> 256         for doc in stream:
    257             yield doc
    258 
ValueError: Error parsing doc: Windows

If you use nlp("Windows") it works fine. Also if you execute nlp("Windows") before the same pipe() call, pipe() does not raise an exception (a dictionary is built?)

Versions:

Python 3.4.3 (Problem not related to ipython)
spacy 0.101.0

Maybe this is related to this region syntax/parser.pyx

if not eg.is_valid[guess]:
    # with gil:
    #     move_name = self.moves.move_name(action.move, action.label)
    #     print 'invalid action:', move_name
    return 1

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 24 (15 by maintainers)

Commits related to this issue

Most upvoted comments

I think this should fix the segfault too — I think they were related.

Closing for now. Again, if it reoccurs, don’t hesitate to reopen 😃

Yes. I have the test set up and I’m pretty sure I understand the problem now. Fix should be out soon.