nltk: I tried everything and still I get: [nltk_data] Error loading taggers: Package 'taggers' not found in [nltk_data] index

Hi everyone!

I’m using Langchain to create a custom LLM. To the process I have to use nltk and I’ve been following all the steps. installed nltk but I have seen it hasn’t created a nltk_data folder. So after installilng, upgrading, uninstalling and installing again it didn’t work. So I added on my code this lines:

import nltk nltk.data.path = ['C:\\Users\\zaesa\\AppData\\Roaming\\nltk_data'] nltk.download( 'tokenizers', download_dir='C:\\Users\\zaesa\\AppData\\Roaming\\nltk_data') nltk.download( 'punkt', download_dir='C:\\Users\\zaesa\\AppData\\Roaming\\nltk_data') nltk.download('all')

When I check which packages are installed:

`installed_packages = nltk.downloader.Downloader( download_dir=‘C:\Users\zaesa\AppData\Roaming\nltk_data’).packages()

print(installed_packages)`

I get:

dict_values([<Package perluniprops>, <Package mwa_ppdb>, <Package punkt>, <Package rslp>, <Package porter_test>, <Package snowball_data>, <Package maxent_ne_chunker>, <Package moses_sample>, <Package bllip_wsj_no_aux>, <Package word2vec_sample>, <Package wmt15_eval>, <Package spanish_grammars>, <Package sample_grammars>, <Package large_grammars>, <Package book_grammars>, <Package basque_grammars>, <Package maxent_treebank_pos_tagger>, <Package averaged_perceptron_tagger>, <Package averaged_perceptron_tagger_ru>, <Package universal_tagset>, <Package vader_lexicon>, <Package lin_thesaurus>, <Package movie_reviews>, <Package problem_reports>, <Package pros_cons>, <Package masc_tagged>, <Package sentence_polarity>, <Package webtext>, <Package nps_chat>, <Package city_database>, <Package europarl_raw>, <Package biocreative_ppi>, <Package verbnet3>, <Package pe08>, <Package pil>, <Package crubadan>, <Package gutenberg>, <Package propbank>, <Package machado>, <Package state_union>, <Package twitter_samples>, <Package semcor>, <Package wordnet31>, <Package extended_omw>, <Package names>, <Package ptb>, <Package nombank.1.0>, <Package floresta>, <Package comtrans>, <Package knbc>, <Package mac_morpho>, <Package swadesh>, <Package rte>, <Package toolbox>, <Package jeita>, <Package product_reviews_1>, <Package omw>, <Package wordnet2022>, <Package sentiwordnet>, <Package product_reviews_2>, <Package abc>, <Package wordnet2021>, <Package udhr2>, <Package senseval>, <Package words>, <Package framenet_v15>, <Package unicode_samples>, <Package kimmo>, <Package framenet_v17>, <Package chat80>, <Package qc>, <Package inaugural>, <Package wordnet>, <Package stopwords>, <Package verbnet>, <Package shakespeare>, <Package ycoe>, <Package ieer>, <Package cess_cat>, <Package switchboard>, <Package comparative_sentences>, <Package subjectivity>, <Package udhr>, <Package pl196x>, <Package paradigms>, <Package gazetteers>, <Package timit>, <Package treebank>, <Package sinica_treebank>, <Package opinion_lexicon>, <Package ppattach>, <Package dependency_treebank>, <Package reuters>, <Package genesis>, <Package cess_esp>, <Package conll2007>, <Package nonbreaking_prefixes>, <Package dolch>, <Package smultron>, <Package alpino>, <Package wordnet_ic>, <Package brown>, <Package bcp47>, <Package panlex_swadesh>, <Package conll2000>, <Package universal_treebanks_v20>, <Package brown_tei>, <Package cmudict>, <Package omw-1.4>, <Package mte_teip5>, <Package indian>, <Package conll2002>, <Package tagsets>])

But when I look manually on the nltk_data folder on the right path I don’t see all this packages and there is no “tokenizers” and no “taggers” even if I wrote:

`import nltk nltk.data.path = [‘C:\Users\zaesa\AppData\Roaming\nltk_data’]

nltk.download( ‘tokenizers’, download_dir=‘C:\Users\zaesa\AppData\Roaming\nltk_data’)

nltk.download( ‘punkt’, download_dir=‘C:\Users\zaesa\AppData\Roaming\nltk_data’) nltk.download(‘all’)`

Why I still get this error? It makes non sense:

[nltk_data] Error loading tokenizers: Package 'tokenizers' not found [nltk_data] in index [nltk_data] Error loading taggers: Package 'taggers' not found in [nltk_data] index

Appreciate any help. I can’t figure out what is wrong with nltk. I follow everything and in all possible ways and it just doesn’t work! Thank you.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 24 (10 by maintainers)

Most upvoted comments

Thank you soooo much @akowalsk !! You saved my day! Wish you a wonderful weekend!

Also thanks a lot to @tomaarsen . Amazing support!! Enjoy the weekend!!

I don’t think so, you should just have to do pip install -U unstructured. I can’t say for sure though because I’m using unstructured directly right now. My error was exactly the same as what you saw and it went away once I upgraded to 0.8.7.

Unstructured just pushed a release 5 minutes ago: https://github.com/Unstructured-IO/unstructured/releases/tag/0.8.7

Upgrading to that fixed it for me.