transformers: Inference API: Can't load tokenizer using from_pretrained, please update its configuration: No such file or directory (os error 2)

Environment info

transformers version:
Platform:
Python version:
PyTorch version (GPU?):
Tensorflow version (GPU?):
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help

@LysandreJik @patil-suraj

Information

I am trying to use the Inference API in the HuggingFace Hub with a version of GPT-2 I finetuned on a custom task.

To reproduce

When I try to use the api, the following error comes

Steps to reproduce the behavior: Here is the files I have in my private repo:

Expected behavior

I uploaded the tokenizer files to colab, and I was able to instantiate a tokenizer with the from_pretrained method, so I don’t know why the inference api throws an error.

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 35 (13 by maintainers)

Commits related to this issue

Fixing double `use_auth_token.pop` (preventing private models from being visible). Should fix: https://github.com/huggingface/transformers/issues/14334#issuecomment-1634527833 Repro: Have a private ... — committed to huggingface/transformers by Narsil a year ago
Fixing double `use_auth_token.pop` (preventing private models from being visible). (#24812) Fixing double `use_auth_token.pop` (preventing private models from being visible). Should fix: https://... — committed to huggingface/transformers by Narsil a year ago
Fixing double `use_auth_token.pop` (preventing private models from being visible). (#24812) Fixing double `use_auth_token.pop` (preventing private models from being visible). Should fix: https://... — committed to blbadger/transformers by Narsil a year ago

Most upvoted comments

Hi,

Having the same issue, Steps followed :

Trained the model (t5-base) using custom PyTorch (no Trainer).
Loading model and tokenizer locally works fine using T5Tokenizer (not using AutoTokenizer ).
Pushed the model to HuggingFace hub using model.push_to_hub() and tokenizer.push_to_hub()

Behavior :

Loaded tokenizer from hub using AutoTokenizer doesn’t work.
Loading using T5Tokenizer also from hub works.
Looking at the files directory in the hub, only seeing tokenizer_config.json !
Interface API gives the error : Can’t load tokenizer using from_pretrained, please update its configuration: No such file or directory (os error 2)

rouzki on Jul 2, 2022

tokenizer_config.json is necessary for some additional information in the tokenizer. Original gpt2 repo might be different, but there’s some code for legacy models to make sure everything works smoothly for those.

The path within that file is indeed something to look into but it should work nonetheless.

Narsil on Dec 29, 2021

@Narsil I downloaded the tokenizer.json file from the original gpt2-medium checkpoint from the hub and I added it to my model’s repo and it works now. However, this file is not produced automatically by the ‘save_pretrained()’ method of the hugginface GPT2LMHeadModel class, or the AutoTokenizer class . When loading a tokenizer manually using the AutoTokenizer class in Google Colab, this ‘tokenizer.json’ file isn’t necessary (it loads correctly given just the files from AutoTokenizer.save_pretrained() method). Was my solution of adding the tokenizer.json correct, or will it cause any hidden errors?

nbravulapalli on Nov 9, 2021

@Narsil Haha I understand. Yes I can confirm it is working well

bonaventuredossou on Jul 18, 2023

@shiffman ,

I am not sure what are the steps for push_to_hub to upload the tokenizer.

tokenizer.push_to_hub("sgugger/my-awesome-model")

Might be necessary

@sgugger can you confirm ?

Narsil on Mar 22, 2022