transformers: Token embedding resizing does not work for TFGPT2Model
System Info
transformersversion: 4.25.1- Platform: Linux-5.15.0-57-generic-x86_64-with-glibc2.35
- Python version: 3.9.16
- Huggingface_hub version: 0.11.1
- PyTorch version (GPU?): not installed (NA)
- Tensorflow version (GPU?): 2.11.0 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help?
@gante and @Rocketknight1
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
After add_special_tokens to tokenizer and resize_token_embeddings on TFGPT2Model, evaluating the model results in an error that indicates that the embeddings are not resized as expected.
Please see the example code and the execution output below:
from transformers import GPT2Tokenizer, TFGPT2Model
SPECIAL_TOKENS_MAPPING = {
'bos_token': '<bos>',
'eos_token': '<eos>',
'pad_token': '<pad>',
'additional_special_tokens': ['<speaker1>', '<speaker2>']
}
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = TFGPT2Model.from_pretrained("gpt2")
print("Evaluating TFGPT2Model BEFORE extending the tokenizer and model with additional tokens ...")
inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")
print(f"inputs = \n{inputs}\n")
outputs = model(inputs)
print(f"DONE!")
print("Adding tokens...")
orig_num_tokens = len(tokenizer.get_vocab())
num_special_tokens = tokenizer.add_special_tokens(SPECIAL_TOKENS_MAPPING)
print(f"orig_num_tokens = {orig_num_tokens}, num_special_tokens={num_special_tokens}")
model.resize_token_embeddings(new_num_tokens=orig_num_tokens + num_special_tokens)
print("Evaluating TFGPT2Model AFTER extending the tokenizer and model with additional tokens ...")
inputs = tokenizer("<speaker1>Hello, my dog is cute<speaker2>I agree!", return_tensors="tf")
print(f"inputs = \n{inputs}\n")
outputs = model(inputs)
print(f"DONE!")
Evaluating TFGPT2Model BEFORE extending the tokenizer and model with additional tokens ...
inputs =
{'input_ids': <tf.Tensor: shape=(1, 6), dtype=int32, numpy=array([[15496, 11, 616, 3290, 318, 13779]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(1, 6), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1]], dtype=int32)>}
DONE!
Adding tokens...
orig_num_tokens = 50257, num_special_tokens=5
Evaluating TFGPT2Model AFTER extending the tokenizer and model with additional tokens ...
inputs =
{'input_ids': <tf.Tensor: shape=(1, 11), dtype=int32, numpy=
array([[50260, 15496, 11, 616, 3290, 318, 13779, 50261, 40,
4236, 0]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(1, 11), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32)>}
Traceback (most recent call last):
File "/home/freddy/workspace/Nuhame/mlpug/examples/chatbot/tensorflow/test_tf_resize_token_size.py", line 33, in <module>
outputs = model(inputs)
File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 432, in run_call_with_unpacked_inputs
return func(self, **unpacked_inputs)
File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/models/gpt2/modeling_tf_gpt2.py", line 773, in call
outputs = self.transformer(
File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 432, in run_call_with_unpacked_inputs
return func(self, **unpacked_inputs)
File "/home/freddy/.virtualenvs/mlpug-tf/lib/python3.9/site-packages/transformers/models/gpt2/modeling_tf_gpt2.py", line 447, in call
tf.debugging.assert_less(
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer 'transformer' (type TFGPT2MainLayer).
input_ids must be smaller than the embedding layer's input dimension (got 50261 >= 50257)
Condition x < y did not hold.
First 3 elements of x:
[50260 15496 11]
First 1 elements of y:
[50257]
Call arguments received by layer 'transformer' (type TFGPT2MainLayer):
• input_ids=tf.Tensor(shape=(1, 11), dtype=int32)
• past_key_values=None
• attention_mask=tf.Tensor(shape=(1, 11), dtype=int32)
• token_type_ids=None
• position_ids=None
• head_mask=None
• inputs_embeds=None
• encoder_hidden_states=None
• encoder_attention_mask=None
• use_cache=True
• output_attentions=False
• output_hidden_states=False
• return_dict=True
• training=False
Expected behavior
The model should have 50257 + 5 = 50262 embeddings after resizing and thus an input ID with value 50261 should not result in any errors. The above code should run without errors.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16 (7 by maintainers)
Commits related to this issue
- Fixed issue #21053 — committed to susnato/transformers by deleted user a year ago
- Fixed issue #21053 (#21065) Co-authored-by: susnato <susnato@tensorflow123456@gmail.com> — committed to huggingface/transformers by susnato a year ago
- Fixed issue #21053 (#21065) Co-authored-by: susnato <susnato@tensorflow123456@gmail.com> — committed to venkat-natchi/transformers by susnato a year ago
- Fixed issue #21053 (#21065) Co-authored-by: susnato <susnato@tensorflow123456@gmail.com> — committed to miyu386/transformers by susnato a year ago
Fixed on all models, thanks to @susnato 🧡
Hey @tqye2000 – using the best possible reference, the code itself, you can see that you don’t need to shift the inputs. In other words, labels = inputs, all shifting happens inside the model. I hope this helps 🤗
Hi @gante May I ask another question. For fine tuning the gpt-2 model, should I pass the labels exactly the same as the inputs or should I shift the inputs by one token to create the labels? I get mixed information on the internet, some said the labels should be a copy of inputs, some examples showed the labels should be one-token shifted of the inputs. I apologise if here is not the right place for asking such questions! Many thanks!
Thank you very much, @gante! After having upgraded to the current source version, the resize_token_emeddings() seems to be working now. However I get “Allocation of 740033280 exceeds 10% of free system memory” messages. I guess this is my PC’s issue.
Hey @tqye2000 👋 You can upgrade your
transformersinstallation to match the current source version withpip install --upgrade git+https://github.com/huggingface/transformers.git@visionscaper thank you for raising the issue! It is a generalized problem with this check, which should only rely on the config’s vocab size (which is the only reliable source of the actual vocabulary size at any given moment).
@susnato opened a fix for GPT2, but other models will also need a fix as well