transformers: Error:"TypeError: 'NoneType' object is not callable" with model specific Tokenizers

Environment info

  • transformers version: 4.17.0.dev0 but also current latest master
  • Platform: Colab
  • Python version:
  • PyTorch version (GPU?):
  • Tensorflow version (GPU?):
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Models:

  • XGLM (although I don’t think this is model specific)

To reproduce

Steps to reproduce the behavior:

  1. Try to create a tokenizer from class-specific tokenizers (eg XGLMTokenizer), it fails.
from transformers import XGLMTokenizer
tokenizer = XGLMTokenizer.from_pretrained("facebook/xglm-564M")

Expected behavior

Should work, but it fails with this exception:

Error:"TypeError: 'NoneType' object is not callable" with Tokenizers

However, creating it with AutoTokenizer just works, this is fine, but there’re a lot of examples for specific models which do not use AutoTokenizer (I found out this by pasting an example from XGLM)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 3
  • Comments: 19 (10 by maintainers)

Most upvoted comments

I ran my code in colab and on the first try it also gave the same error. But then I restarted the runtime, then it worked.

I am having the same issue with T5Tokenizer - I have installed sentencepiece and I still get the NoneType error. I am working on a Google Colab file and here is my environment info from the transformers-cli env command:

  • transformers version: 4.24.0
  • Platform: Linux-5.10.133±x86_64-with-glibc2.27
  • Python version: 3.8.15
  • Huggingface_hub version: 0.11.1
  • PyTorch version (GPU?): 1.12.1+cu113 (False)
  • Tensorflow version (GPU?): 2.9.2 (False)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: no
  • Using distributed or parallel set-up in script?: no

Also, pip install sentencepiece gave the following output: Successfully installed sentencepiece-0.1.97

image I am getting this error for xlm-roberta-base model. Any help will be very helpful.

Hi @afcruzs ! XGLMTokenizer depends on sentencepiece, if it’s not installed then a None object is imported.

pip install sentencepiece should resolve this.

AutoTokenizer seems to work but because it’s a separate class altogether (PreTrainedTokenizerFast)

yes, by default AutoTokenizer returns fast tokenizer.

Closing this issue as it seems to be solved 😄

@tianshuailu same thing, restarting the kernel worked. Pretty buggy I would say.

@tanuj2212, My hunch is that you are missing the sentencepiece dependency. Could you check if you don’t have it in your virtual environment or install it with for example pip install sentencepiece?

What version of transformers are you using? I would have expected you to get the following message the first time you tried to use XLMRobertaTokenizer :

XLMRobertaTokenizer requires the SentencePiece library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones
that match your environment.

@SaulLu I’ve tried to repro it again and i couldn’t, it seems to work just like you said 😐. I’ll take a closer look and share a colab if I find something, otherwise will close this. Thanks!