neuspell: Error quick-start code snippet - UnpicklingError: invalid load key, '<'.
Hello,
I tried to run the quick-start code snippet :
import neuspell
from neuspell import available_checkers, BertChecker
""" see available checkers """
print(f"available checkers: {neuspell.available_checkers()}")
# → available checkers: ['BertsclstmChecker', 'CnnlstmChecker', 'NestedlstmChecker', 'SclstmChecker', 'SclstmbertChecker', 'BertChecker', 'SclstmelmoChecker', 'ElmosclstmChecker']
""" select spell checkers & load """
checker = BertChecker()
checker.from_pretrained()
""" spell correction """
checker.correct("I luk foward to receving your reply")
# → "I look forward to receiving your reply"
checker.correct_strings(["I luk foward to receving your reply", ])
# → ["I look forward to receiving your reply"]
checker.correct_from_file(src="noisy_texts.txt")
# → "Found 450 mistakes in 322 lines, total_lines=350"
""" evaluation of models """
checker.evaluate(clean_file="bea60k.txt", corrupt_file="bea60k.noise.txt")
# → data size: 63044
# → total inference time for this data is: 998.13 secs
# → total token count: 1032061
# → confusion table: corr2corr:940937, corr2incorr:21060,
# incorr2corr:55889, incorr2incorr:14175
# → accuracy is 96.58%
# → word correction rate is 79.76%
But I have this following error :
File “C:\Users\user\Anaconda3\lib\site-packages\torch\serialization.py”, line 764, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) UnpicklingError: invalid load key, ‘<’.
Can you help me please ?
Thanks, Camille
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 12
- Comments: 18
Same problem 😦
What fails is the code that access the data files stored in the google drive directory. A quick fix is to download the data by hand from the google drive and put it on the data directory expected by the library. The path of the library is printed when you import the library it looks something like
import neuspelldata folder is set to "/tmp/test_fix/neuspell/neuspell/../data" scriptThe google drive folder https://drive.google.com/drive/folders/1jgNpYe4TVSF4mMBVtFh4QfB2GovNPdh7?usp=sharing Go there and download the data for the model you want, for example for BERT is the subwordbert-probwordnoise folder. You have to put it under the checkpoints directory.
checker = BertChecker()checker.from_pretrained()loading vocab from path:/tmp/test_fix/neuspell/neuspell/../data/checkpoints/subwordbert-probwordnoise/vocab.pklinitializing modelNumber of parameters in the model: 185211810Loading model params from checkpoint dir: /tmp/test_fix/neuspell/neuspell/../data/checkpoints/subwordbert-probwordnoiseI happened to have installed neuspell before this corruption happened. One of my colleagues shared it for me on an s3 node. You can get the correct files for the data/checkpoints/subwordbert-probwordnoise/ directory here:
https://s3.amazonaws.com/mitros.org/p/ets/subwordbert.tar.gz