transformers: Error:"TypeError: 'NoneType' object is not callable" with model specific Tokenizers

Environment info

transformers version: 4.17.0.dev0 but also current latest master
Platform: Colab
Python version:
PyTorch version (GPU?):
Tensorflow version (GPU?):
Using GPU in script?:
Using distributed or parallel set-up in script?:

Models:

XGLM (although I don’t think this is model specific)

To reproduce

Steps to reproduce the behavior:

Try to create a tokenizer from class-specific tokenizers (eg XGLMTokenizer), it fails.

from transformers import XGLMTokenizer
tokenizer = XGLMTokenizer.from_pretrained("facebook/xglm-564M")

Expected behavior

Should work, but it fails with this exception:

Error:"TypeError: 'NoneType' object is not callable" with Tokenizers

However, creating it with AutoTokenizer just works, this is fine, but there’re a lot of examples for specific models which do not use AutoTokenizer (I found out this by pasting an example from XGLM)

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 3
Comments: 19 (10 by maintainers)

Most upvoted comments

I ran my code in colab and on the first try it also gave the same error. But then I restarted the runtime, then it worked.

tianshuailu on Dec 9, 2022

I am having the same issue with T5Tokenizer - I have installed sentencepiece and I still get the NoneType error. I am working on a Google Colab file and here is my environment info from the transformers-cli env command:

transformers version: 4.24.0
Platform: Linux-5.10.133±x86_64-with-glibc2.27
Python version: 3.8.15
Huggingface_hub version: 0.11.1
PyTorch version (GPU?): 1.12.1+cu113 (False)
Tensorflow version (GPU?): 2.9.2 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: no
Using distributed or parallel set-up in script?: no

Also, pip install sentencepiece gave the following output: Successfully installed sentencepiece-0.1.97

brooklynsheppard on Dec 1, 2022

I am getting this error for xlm-roberta-base model. Any help will be very helpful.

tanuj2212 on Jul 26, 2022

Hi @afcruzs ! XGLMTokenizer depends on sentencepiece, if it’s not installed then a None object is imported.

pip install sentencepiece should resolve this.

AutoTokenizer seems to work but because it’s a separate class altogether (PreTrainedTokenizerFast)

yes, by default AutoTokenizer returns fast tokenizer.

patil-suraj on Feb 28, 2022

Closing this issue as it seems to be solved 😄

SaulLu on Mar 30, 2022

@tianshuailu same thing, restarting the kernel worked. Pretty buggy I would say.

ndvbd on Jan 25, 2023

@tanuj2212, My hunch is that you are missing the sentencepiece dependency. Could you check if you don’t have it in your virtual environment or install it with for example pip install sentencepiece?

What version of transformers are you using? I would have expected you to get the following message the first time you tried to use XLMRobertaTokenizer :

XLMRobertaTokenizer requires the SentencePiece library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones
that match your environment.

SaulLu on Jul 26, 2022

@SaulLu I’ve tried to repro it again and i couldn’t, it seems to work just like you said 😐. I’ll take a closer look and share a colab if I find something, otherwise will close this. Thanks!

afcruzs on Mar 2, 2022