pyahocorasick: Invalid pickle file generated: "ValueError: binary data truncated (1)"

I’ve managed to create an automation, and then pickle that automation to a 286 Mb pickle file. Problem is, when I try to unpickle, I get this error:

$ python -m pickle wikidata-automation.pickle 
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/pickle.py", line 1605, in <module>
    obj = load(f)
ValueError: binary data truncated (1)

The source of that error is here: https://github.com/WojciechMula/pyahocorasick/blob/master/Automaton_pickle.c#L309

Would you mind helping me troubleshoot this? Any ideas? I don’t think I can send files this big to you?

Update: This is how I build the pickle file:

automaton = ahocorasick.Automaton()
for i, (label, id_) in enumerate(generator):
    automaton.add_word(label, id_)

automaton.make_automaton()

with open(filename_out, "wb") as f:
    pickle.dump(automaton, f)

Where generator just runs yield ("Belgium", "Q31").

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 3
  • Comments: 60 (38 by maintainers)

Commits related to this issue

Most upvoted comments

@EmilStenstrom, @woakesd, @Dobatymo, @pombredanne, @leonqli — version 1.1.13.1 is available on PIP. If you wish, please check it.

Sorry that it took so long. And thank you very much for your help.

I use this one https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz Tomorrow I can check which version/date of the dump exactly and give you the code to reproduce it.

@EmilStenstrom Thank you very much for your time and effort. I really want to fix that bug, but so far I couldn’t. 😦

As far I understand your problem, you could try ngram-indexes. They allow to narrow searched space significantly, and are not too complicated. I did some experiments with full-text search and results were impressive.

Hi! I’m still planning to try the compile flags you suggested above, didn’t have time! Maybe next week!