transformers: Inference API: Can't load tokenizer using from_pretrained, please update its configuration: No such file or directory (os error 2)
Environment info
transformers
version:- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help
Information
I am trying to use the Inference API in the HuggingFace Hub with a version of GPT-2 I finetuned on a custom task.
To reproduce
When I try to use the api, the following error comes
Steps to reproduce the behavior:
Here is the files I have in my private repo:
Expected behavior
I uploaded the tokenizer files to colab, and I was able to instantiate a tokenizer with the from_pretrained method, so I don’t know why the inference api throws an error.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 35 (13 by maintainers)
Commits related to this issue
- Fixing double `use_auth_token.pop` (preventing private models from being visible). Should fix: https://github.com/huggingface/transformers/issues/14334#issuecomment-1634527833 Repro: Have a private ... — committed to huggingface/transformers by Narsil a year ago
- Fixing double `use_auth_token.pop` (preventing private models from being visible). (#24812) Fixing double `use_auth_token.pop` (preventing private models from being visible). Should fix: https://... — committed to huggingface/transformers by Narsil a year ago
- Fixing double `use_auth_token.pop` (preventing private models from being visible). (#24812) Fixing double `use_auth_token.pop` (preventing private models from being visible). Should fix: https://... — committed to blbadger/transformers by Narsil a year ago
Hi,
Having the same issue, Steps followed :
Behavior :
tokenizer_config.json
is necessary for some additional information in the tokenizer. Originalgpt2
repo might be different, but there’s some code for legacy models to make sure everything works smoothly for those.The path within that file is indeed something to look into but it should work nonetheless.
@Narsil I downloaded the tokenizer.json file from the original gpt2-medium checkpoint from the hub and I added it to my model’s repo and it works now. However, this file is not produced automatically by the ‘save_pretrained()’ method of the hugginface GPT2LMHeadModel class, or the AutoTokenizer class . When loading a tokenizer manually using the AutoTokenizer class in Google Colab, this ‘tokenizer.json’ file isn’t necessary (it loads correctly given just the files from AutoTokenizer.save_pretrained() method). Was my solution of adding the tokenizer.json correct, or will it cause any hidden errors?
@Narsil Haha I understand. Yes I can confirm it is working well
@shiffman ,
I am not sure what are the steps for
push_to_hub
to upload the tokenizer.Might be necessary
@sgugger can you confirm ?