nltk: I tried everything and still I get: [nltk_data] Error loading taggers: Package 'taggers' not found in [nltk_data] index
Hi everyone!
I’m using Langchain to create a custom LLM. To the process I have to use nltk and I’ve been following all the steps. installed nltk but I have seen it hasn’t created a nltk_data folder. So after installilng, upgrading, uninstalling and installing again it didn’t work. So I added on my code this lines:
import nltk nltk.data.path = ['C:\\Users\\zaesa\\AppData\\Roaming\\nltk_data'] nltk.download( 'tokenizers', download_dir='C:\\Users\\zaesa\\AppData\\Roaming\\nltk_data') nltk.download( 'punkt', download_dir='C:\\Users\\zaesa\\AppData\\Roaming\\nltk_data') nltk.download('all')
When I check which packages are installed:
`installed_packages = nltk.downloader.Downloader( download_dir=‘C:\Users\zaesa\AppData\Roaming\nltk_data’).packages()
print(installed_packages)`
I get:
dict_values([<Package perluniprops>, <Package mwa_ppdb>, <Package punkt>, <Package rslp>, <Package porter_test>, <Package snowball_data>, <Package maxent_ne_chunker>, <Package moses_sample>, <Package bllip_wsj_no_aux>, <Package word2vec_sample>, <Package wmt15_eval>, <Package spanish_grammars>, <Package sample_grammars>, <Package large_grammars>, <Package book_grammars>, <Package basque_grammars>, <Package maxent_treebank_pos_tagger>, <Package averaged_perceptron_tagger>, <Package averaged_perceptron_tagger_ru>, <Package universal_tagset>, <Package vader_lexicon>, <Package lin_thesaurus>, <Package movie_reviews>, <Package problem_reports>, <Package pros_cons>, <Package masc_tagged>, <Package sentence_polarity>, <Package webtext>, <Package nps_chat>, <Package city_database>, <Package europarl_raw>, <Package biocreative_ppi>, <Package verbnet3>, <Package pe08>, <Package pil>, <Package crubadan>, <Package gutenberg>, <Package propbank>, <Package machado>, <Package state_union>, <Package twitter_samples>, <Package semcor>, <Package wordnet31>, <Package extended_omw>, <Package names>, <Package ptb>, <Package nombank.1.0>, <Package floresta>, <Package comtrans>, <Package knbc>, <Package mac_morpho>, <Package swadesh>, <Package rte>, <Package toolbox>, <Package jeita>, <Package product_reviews_1>, <Package omw>, <Package wordnet2022>, <Package sentiwordnet>, <Package product_reviews_2>, <Package abc>, <Package wordnet2021>, <Package udhr2>, <Package senseval>, <Package words>, <Package framenet_v15>, <Package unicode_samples>, <Package kimmo>, <Package framenet_v17>, <Package chat80>, <Package qc>, <Package inaugural>, <Package wordnet>, <Package stopwords>, <Package verbnet>, <Package shakespeare>, <Package ycoe>, <Package ieer>, <Package cess_cat>, <Package switchboard>, <Package comparative_sentences>, <Package subjectivity>, <Package udhr>, <Package pl196x>, <Package paradigms>, <Package gazetteers>, <Package timit>, <Package treebank>, <Package sinica_treebank>, <Package opinion_lexicon>, <Package ppattach>, <Package dependency_treebank>, <Package reuters>, <Package genesis>, <Package cess_esp>, <Package conll2007>, <Package nonbreaking_prefixes>, <Package dolch>, <Package smultron>, <Package alpino>, <Package wordnet_ic>, <Package brown>, <Package bcp47>, <Package panlex_swadesh>, <Package conll2000>, <Package universal_treebanks_v20>, <Package brown_tei>, <Package cmudict>, <Package omw-1.4>, <Package mte_teip5>, <Package indian>, <Package conll2002>, <Package tagsets>])
But when I look manually on the nltk_data folder on the right path I don’t see all this packages and there is no “tokenizers” and no “taggers” even if I wrote:
`import nltk nltk.data.path = [‘C:\Users\zaesa\AppData\Roaming\nltk_data’]
nltk.download( ‘tokenizers’, download_dir=‘C:\Users\zaesa\AppData\Roaming\nltk_data’)
nltk.download( ‘punkt’, download_dir=‘C:\Users\zaesa\AppData\Roaming\nltk_data’) nltk.download(‘all’)`
Why I still get this error? It makes non sense:
[nltk_data] Error loading tokenizers: Package 'tokenizers' not found [nltk_data] in index [nltk_data] Error loading taggers: Package 'taggers' not found in [nltk_data] index
Appreciate any help. I can’t figure out what is wrong with nltk. I follow everything and in all possible ways and it just doesn’t work! Thank you.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 24 (10 by maintainers)
Thank you soooo much @akowalsk !! You saved my day! Wish you a wonderful weekend!
Also thanks a lot to @tomaarsen . Amazing support!! Enjoy the weekend!!
I don’t think so, you should just have to do
pip install -U unstructured
. I can’t say for sure though because I’m using unstructured directly right now. My error was exactly the same as what you saw and it went away once I upgraded to 0.8.7.Unstructured just pushed a release 5 minutes ago: https://github.com/Unstructured-IO/unstructured/releases/tag/0.8.7
Upgrading to that fixed it for me.