spaCy: pipe(): ValueError Error parsing doc
I found strange behaviour using the pipe() method (only verified on german variant):
If you parse a document using pipe() you can get a ValueError, while if i use nlp(text) everything is fine. I boiled it down to single words, while german words work, english words like ‘windows’ don’t work.
Steps to reproduce:
import spacy
nlp = spacy.load('de')
def texts():
yield "Windows"
for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
print(len(doc)) # doc access -> ValueError
Trace
ValueError Traceback (most recent call last)
<ipython-input-2-9a095ec5505b> in <module>()
8 def texts():
9 yield "Windows"
---> 10 for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
11 print(len(doc))
.../venv/lib/python3.4/site-packages/spacy/language.py in pipe(self, texts, tag, parse, entity, n_threads, batch_size)
254 stream = self.entity.pipe(stream,
255 n_threads=1, batch_size=batch_size)
--> 256 for doc in stream:
257 yield doc
258
ValueError: Error parsing doc: Windows
If you use nlp("Windows") it works fine. Also if you execute nlp("Windows") before the same pipe() call, pipe() does not raise an exception (a dictionary is built?)
Versions:
Python 3.4.3 (Problem not related to ipython)
spacy 0.101.0
Maybe this is related to this region syntax/parser.pyx
if not eg.is_valid[guess]:
# with gil:
# move_name = self.moves.move_name(action.move, action.label)
# print 'invalid action:', move_name
return 1
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 24 (15 by maintainers)
Commits related to this issue
- Raise errors when no actions are available, re Issue #429 — committed to explosion/spaCy by honnibal 8 years ago
- Fix Issue #429. Add an initialize_state method to the named entity recogniser that adds missing entity types. This is a messy place to add this, because it's strange to have the method mutate state. A... — committed to explosion/spaCy by honnibal 8 years ago
I think this should fix the segfault too — I think they were related.
Closing for now. Again, if it reoccurs, don’t hesitate to reopen 😃
Yes. I have the test set up and I’m pretty sure I understand the problem now. Fix should be out soon.