pyahocorasick: Invalid pickle file generated: "ValueError: binary data truncated (1)"
I’ve managed to create an automation, and then pickle that automation to a 286 Mb pickle file. Problem is, when I try to unpickle, I get this error:
$ python -m pickle wikidata-automation.pickle
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/Cellar/python3/3.5.2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/pickle.py", line 1605, in <module>
obj = load(f)
ValueError: binary data truncated (1)
The source of that error is here: https://github.com/WojciechMula/pyahocorasick/blob/master/Automaton_pickle.c#L309
Would you mind helping me troubleshoot this? Any ideas? I don’t think I can send files this big to you?
Update: This is how I build the pickle file:
automaton = ahocorasick.Automaton()
for i, (label, id_) in enumerate(generator):
automaton.add_word(label, id_)
automaton.make_automaton()
with open(filename_out, "wb") as f:
pickle.dump(automaton, f)
Where generator just runs yield ("Belgium", "Q31")
.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 3
- Comments: 60 (38 by maintainers)
@EmilStenstrom, @woakesd, @Dobatymo, @pombredanne, @leonqli — version 1.1.13.1 is available on PIP. If you wish, please check it.
Sorry that it took so long. And thank you very much for your help.
I use this one https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz Tomorrow I can check which version/date of the dump exactly and give you the code to reproduce it.
@EmilStenstrom Thank you very much for your time and effort. I really want to fix that bug, but so far I couldn’t. 😦
As far I understand your problem, you could try ngram-indexes. They allow to narrow searched space significantly, and are not too complicated. I did some experiments with full-text search and results were impressive.
Hi! I’m still planning to try the compile flags you suggested above, didn’t have time! Maybe next week!