spaCy: ValueError: 2539520 exceeds max_bin_len(1048576) when uses spacy.load()
Hi, I’m new with spaCy.
So, I’ve tried a little script to understand how it works:
import spacy
nlp = spacy.load('pt')
I’ve already have installed spacy (with pip and conda), using python3.6, and already downloaded the portuguese model, but I’m getting this following error:
Traceback (most recent call last): File “C:/Users/rocha/PycharmProjects/projeto/entidade.py”, line 4, in <module> nlp = spacy.load(‘pt’) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy_init_.py”, line 18, in load return util.load_model(name, **overrides) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 112, in load_model return load_model_from_link(name, **overrides) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 129, in load_model_from_link return cls.load(**overrides) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\data\pt_init_.py”, line 12, in load return load_model_from_init_py(file, **overrides) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 173, in load_model_from_init_py return load_model_from_path(data_path, meta, **overrides) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 156, in load_model_from_path return nlp.from_disk(model_path) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\language.py”, line 647, in from_disk util.from_disk(path, deserializers, exclude) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 511, in from_disk reader(path / key) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\language.py”, line 643, in <lambda> deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False) File “pipeline.pyx”, line 643, in spacy.pipeline.Tagger.from_disk File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 511, in from_disk reader(path / key) File “pipeline.pyx”, line 626, in spacy.pipeline.Tagger.from_disk.load_model File “pipeline.pyx”, line 627, in spacy.pipeline.Tagger.from_disk.load_model File “C:\Users\rocha\Anaconda3\lib\site-packages\thinc\neural_classes\model.py”, line 335, in from_bytes data = msgpack.loads(bytes_data, encoding=‘utf8’) File “C:\Users\rocha\Anaconda3\lib\site-packages\msgpack_numpy.py”, line 214, in unpackb return _unpackb(packed, **kwargs) File “msgpack_unpacker.pyx”, line 187, in msgpack._cmsgpack.unpackb ValueError: 2539520 exceeds max_bin_len(1048576)
Anyone can help?
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 18 (6 by maintainers)
Commits related to this issue
- pin msgpack because of spacy issues https://github.com/explosion/spaCy/issues/2995#issuecomment-443298319 — committed to RasaHQ/rasa by tmbo 6 years ago
- pin msgpack because of spacy issues https://github.com/explosion/spaCy/issues/2995#issuecomment-443298319 — committed to RasaHQ/rasa by tmbo 6 years ago
- Address Rasa NLU issue See https://github.com/explosion/spaCy/issues/2995 Bump base Core image to 12.3, rebuild training data, and address dependency issue. Add binaries to .gitattributes Signed-o... — committed to hyperledger-archives/sawtooth-next-directory by deleted user 6 years ago
- Move chatbot from bin * Add chatbot models Add chatbot built models to source control Signed-off-by: PGobz <p.gobin@gmail.com> * Address Rasa NLU issue See https://github.com/explosion/sp... — committed to hyperledger-archives/sawtooth-next-directory by deleted user 6 years ago
Thanks for the report and sorry you’ve hit this problem. It also just came up in our tests today and it was pretty confusing.
Looks like it might be related to an update of the
msgpacklibrary that was released today and is used in our librarythinc, which spaCy depends on. So when you installed spaCy, that new version was pulled in and apparently it includes a change to the limit?We’ll investigate this and hopefully push an update to thinc soon. In the meantime, try downgrading
msgpack:For those who installed spacy with conda, I found the following command to work the best:
conda install -c conda-forge msgpack-python==0.5.6The version of msgpack is 0.5.6, but the problem still exists.
disfluency_detection/crf.py:36: in init self.nlp = spacy.load(‘en’) .venv/lib/python3.6/site-packages/spacy/init.py:15: in load return util.load_model(name, **overrides) .venv/lib/python3.6/site-packages/spacy/util.py:112: in load_model return load_model_from_link(name, **overrides) .venv/lib/python3.6/site-packages/spacy/util.py:129: in load_model_from_link return cls.load(**overrides) .venv/lib/python3.6/site-packages/spacy/data/en/init.py:12: in load return load_model_from_init_py(file, **overrides) .venv/lib/python3.6/site-packages/spacy/util.py:173: in load_model_from_init_py return load_model_from_path(data_path, meta, **overrides) .venv/lib/python3.6/site-packages/spacy/util.py:156: in load_model_from_path return nlp.from_disk(model_path) .venv/lib/python3.6/site-packages/spacy/language.py:653: in from_disk util.from_disk(path, deserializers, exclude) .venv/lib/python3.6/site-packages/spacy/util.py:511: in from_disk reader(path / key) .venv/lib/python3.6/site-packages/spacy/language.py:649: in <lambda> deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False) pipeline.pyx:643: in spacy.pipeline.Tagger.from_disk ??? .venv/lib/python3.6/site-packages/spacy/util.py:511: in from_disk reader(path / key) pipeline.pyx:626: in spacy.pipeline.Tagger.from_disk.load_model ??? pipeline.pyx:627: in spacy.pipeline.Tagger.from_disk.load_model ??? .venv/lib/python3.6/site-packages/thinc/neural/_classes/model.py:335: in from_bytes data = msgpack.loads(bytes_data, encoding=‘utf8’) .venv/lib/python3.6/site-packages/msgpack_numpy.py:214: in unpackb return _unpackb(packed, **kwargs) msgpack/_unpacker.pyx:187: in msgpack._cmsgpack.unpackb ??? E ValueError: 1792000 exceeds max_bin_len(1048576)
Anyone could help?
Glad it worked!
We also released Thinc
v6.12.1earlier, which pins to the exactmsgpackversion. It should now be installed automatically when you install/update spaCy. https://github.com/explosion/thinc/releases/tag/v6.12.1Just hit this as well, your fix works (we did
msgpack>=0.3.0,<0.6).Try
pip install spacy==2.0.18