spaCy: ValueError: 2539520 exceeds max_bin_len(1048576) when uses spacy.load()

Hi, I’m new with spaCy.

So, I’ve tried a little script to understand how it works:

import spacy

nlp = spacy.load('pt')

I’ve already have installed spacy (with pip and conda), using python3.6, and already downloaded the portuguese model, but I’m getting this following error:

Traceback (most recent call last): File “C:/Users/rocha/PycharmProjects/projeto/entidade.py”, line 4, in <module> nlp = spacy.load(‘pt’) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy_init_.py”, line 18, in load return util.load_model(name, **overrides) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 112, in load_model return load_model_from_link(name, **overrides) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 129, in load_model_from_link return cls.load(**overrides) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\data\pt_init_.py”, line 12, in load return load_model_from_init_py(file, **overrides) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 173, in load_model_from_init_py return load_model_from_path(data_path, meta, **overrides) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 156, in load_model_from_path return nlp.from_disk(model_path) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\language.py”, line 647, in from_disk util.from_disk(path, deserializers, exclude) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 511, in from_disk reader(path / key) File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\language.py”, line 643, in <lambda> deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False) File “pipeline.pyx”, line 643, in spacy.pipeline.Tagger.from_disk File “C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py”, line 511, in from_disk reader(path / key) File “pipeline.pyx”, line 626, in spacy.pipeline.Tagger.from_disk.load_model File “pipeline.pyx”, line 627, in spacy.pipeline.Tagger.from_disk.load_model File “C:\Users\rocha\Anaconda3\lib\site-packages\thinc\neural_classes\model.py”, line 335, in from_bytes data = msgpack.loads(bytes_data, encoding=‘utf8’) File “C:\Users\rocha\Anaconda3\lib\site-packages\msgpack_numpy.py”, line 214, in unpackb return _unpackb(packed, **kwargs) File “msgpack_unpacker.pyx”, line 187, in msgpack._cmsgpack.unpackb ValueError: 2539520 exceeds max_bin_len(1048576)

Anyone can help?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 18 (6 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for the report and sorry you’ve hit this problem. It also just came up in our tests today and it was pretty confusing.

Looks like it might be related to an update of the msgpack library that was released today and is used in our library thinc, which spaCy depends on. So when you installed spaCy, that new version was pulled in and apparently it includes a change to the limit?

We’ll investigate this and hopefully push an update to thinc soon. In the meantime, try downgrading msgpack:

pip install msgpack==0.5.6

For those who installed spacy with conda, I found the following command to work the best:

conda install -c conda-forge msgpack-python==0.5.6

The version of msgpack is 0.5.6, but the problem still exists.

disfluency_detection/crf.py:36: in init self.nlp = spacy.load(‘en’) .venv/lib/python3.6/site-packages/spacy/init.py:15: in load return util.load_model(name, **overrides) .venv/lib/python3.6/site-packages/spacy/util.py:112: in load_model return load_model_from_link(name, **overrides) .venv/lib/python3.6/site-packages/spacy/util.py:129: in load_model_from_link return cls.load(**overrides) .venv/lib/python3.6/site-packages/spacy/data/en/init.py:12: in load return load_model_from_init_py(file, **overrides) .venv/lib/python3.6/site-packages/spacy/util.py:173: in load_model_from_init_py return load_model_from_path(data_path, meta, **overrides) .venv/lib/python3.6/site-packages/spacy/util.py:156: in load_model_from_path return nlp.from_disk(model_path) .venv/lib/python3.6/site-packages/spacy/language.py:653: in from_disk util.from_disk(path, deserializers, exclude) .venv/lib/python3.6/site-packages/spacy/util.py:511: in from_disk reader(path / key) .venv/lib/python3.6/site-packages/spacy/language.py:649: in <lambda> deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False) pipeline.pyx:643: in spacy.pipeline.Tagger.from_disk ??? .venv/lib/python3.6/site-packages/spacy/util.py:511: in from_disk reader(path / key) pipeline.pyx:626: in spacy.pipeline.Tagger.from_disk.load_model ??? pipeline.pyx:627: in spacy.pipeline.Tagger.from_disk.load_model ??? .venv/lib/python3.6/site-packages/thinc/neural/_classes/model.py:335: in from_bytes data = msgpack.loads(bytes_data, encoding=‘utf8’) .venv/lib/python3.6/site-packages/msgpack_numpy.py:214: in unpackb return _unpackb(packed, **kwargs) msgpack/_unpacker.pyx:187: in msgpack._cmsgpack.unpackb ??? E ValueError: 1792000 exceeds max_bin_len(1048576)

Anyone could help?

Glad it worked!

We also released Thinc v6.12.1 earlier, which pins to the exact msgpack version. It should now be installed automatically when you install/update spaCy. https://github.com/explosion/thinc/releases/tag/v6.12.1

Just hit this as well, your fix works (we did msgpack>=0.3.0,<0.6).

Try pip install spacy==2.0.18