transformers: Token embedding resizing does not work for TFGPT2Model

System Info

transformers version: 4.25.1
Platform: Linux-5.15.0-57-generic-x86_64-with-glibc2.35
Python version: 3.9.16
Huggingface_hub version: 0.11.1
PyTorch version (GPU?): not installed (NA)
Tensorflow version (GPU?): 2.11.0 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help?

@gante and @Rocketknight1

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

After add_special_tokens to tokenizer and resize_token_embeddings on TFGPT2Model, evaluating the model results in an error that indicates that the embeddings are not resized as expected.

Please see the example code and the execution output below:

from transformers import GPT2Tokenizer, TFGPT2Model

SPECIAL_TOKENS_MAPPING = {
    'bos_token': '<bos>',
    'eos_token': '<eos>',
    'pad_token': '<pad>',
    'additional_special_tokens': ['<speaker1>', '<speaker2>']
}

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = TFGPT2Model.from_pretrained("gpt2")

print("Evaluating TFGPT2Model BEFORE extending the tokenizer and model with additional tokens ...")

inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
print(f"inputs = \n{inputs}\n")

outputs = model(inputs)
print(f"DONE!")

print("Adding tokens...")
orig_num_tokens = len(tokenizer.get_vocab())
num_special_tokens = tokenizer.add_special_tokens(SPECIAL_TOKENS_MAPPING)
print(f"orig_num_tokens = {orig_num_tokens}, num_special_tokens={num_special_tokens}")

model.resize_token_embeddings(new_num_tokens=orig_num_tokens + num_special_tokens)

print("Evaluating TFGPT2Model AFTER extending the tokenizer and model with additional tokens ...")

inputs = tokenizer("<speaker1>Hello, my dog is cute<speaker2>I agree!", return_tensors="tf")
print(f"inputs = \n{inputs}\n")

outputs = model(inputs)
print(f"DONE!")

Evaluating TFGPT2Model BEFORE extending the tokenizer and model with additional tokens ...
inputs = 
{'input_ids': <tf.Tensor: shape=(1, 6), dtype=int32, numpy=array([[15496,    11,   616,  3290,   318, 13779]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(1, 6), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1]], dtype=int32)>}

DONE!

Adding tokens...
orig_num_tokens = 50257, num_special_tokens=5

Evaluating TFGPT2Model AFTER extending the tokenizer and model with additional tokens ...
inputs = 
{'input_ids': <tf.Tensor: shape=(1, 11), dtype=int32, numpy=
array([[50260, 15496,    11,   616,  3290,   318, 13779, 50261,    40,
         4236,     0]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(1, 11), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32)>}

Traceback (most recent call last):
  File "/home/freddy/workspace/Nuhame/mlpug/examples/chatbot/tensorflow/test_tf_resize_token_size.py", line 33, in <module>
    outputs = model(inputs)
  File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 432, in run_call_with_unpacked_inputs
    return func(self, **unpacked_inputs)
  File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/models/gpt2/modeling_tf_gpt2.py", line 773, in call
    outputs = self.transformer(
  File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 432, in run_call_with_unpacked_inputs
    return func(self, **unpacked_inputs)
  File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/models/gpt2/modeling_tf_gpt2.py", line 447, in call
    tf.debugging.assert_less(
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer 'transformer' (type TFGPT2MainLayer).

input_ids must be smaller than the embedding layer's input dimension (got 50261 >= 50257)
Condition x < y did not hold.
First 3 elements of x:
[50260 15496    11]
First 1 elements of y:
[50257]

Call arguments received by layer 'transformer' (type TFGPT2MainLayer):
  • input_ids=tf.Tensor(shape=(1, 11), dtype=int32)
  • past_key_values=None
  • attention_mask=tf.Tensor(shape=(1, 11), dtype=int32)
  • token_type_ids=None
  • position_ids=None
  • head_mask=None
  • inputs_embeds=None
  • encoder_hidden_states=None
  • encoder_attention_mask=None
  • use_cache=True
  • output_attentions=False
  • output_hidden_states=False
  • return_dict=True
  • training=False

Expected behavior

The model should have 50257 + 5 = 50262 embeddings after resizing and thus an input ID with value 50261 should not result in any errors. The above code should run without errors.

About this issue

Original URL
State: closed
Created a year ago
Comments: 16 (7 by maintainers)

Commits related to this issue

Fixed issue #21053 — committed to susnato/transformers by deleted user a year ago
Fixed issue #21053 (#21065) Co-authored-by: susnato <susnato@tensorflow123456@gmail.com> — committed to huggingface/transformers by susnato a year ago
Fixed issue #21053 (#21065) Co-authored-by: susnato <susnato@tensorflow123456@gmail.com> — committed to venkat-natchi/transformers by susnato a year ago
Fixed issue #21053 (#21065) Co-authored-by: susnato <susnato@tensorflow123456@gmail.com> — committed to miyu386/transformers by susnato a year ago

Most upvoted comments

Fixed on all models, thanks to @susnato 🧡

gante on Jan 20, 2023

Hey @tqye2000 – using the best possible reference, the code itself, you can see that you don’t need to shift the inputs. In other words, labels = inputs, all shifting happens inside the model. I hope this helps 🤗

gante on Jan 26, 2023

Hi @gante May I ask another question. For fine tuning the gpt-2 model, should I pass the labels exactly the same as the inputs or should I shift the inputs by one token to create the labels? I get mixed information on the internet, some said the labels should be a copy of inputs, some examples showed the labels should be one-token shifted of the inputs. I apologise if here is not the right place for asking such questions! Many thanks!

tqye2000 on Jan 24, 2023

Thank you very much, @gante! After having upgraded to the current source version, the resize_token_emeddings() seems to be working now. However I get “Allocation of 740033280 exceeds 10% of free system memory” messages. I guess this is my PC’s issue.

tqye2000 on Jan 24, 2023

Hey @tqye2000 👋 You can upgrade your transformers installation to match the current source version with pip install --upgrade git+https://github.com/huggingface/transformers.git

gante on Jan 24, 2023

@visionscaper thank you for raising the issue! It is a generalized problem with this check, which should only rely on the config’s vocab size (which is the only reliable source of the actual vocabulary size at any given moment).

@susnato opened a fix for GPT2, but other models will also need a fix as well

gante on Jan 16, 2023