transformers: Gemma-7b is not working properly. There is a logical bug somewhere.
Reopening issue about gemma-7b
prediction values.
This issue is still not solved: The perplexity values of gemma-2b and gemma-7b (much worse, near random) are very different. Wikitext-v2 token perplexity for gemma-2b ~= 21. For gemma-7b it is a very large value ~= 1e13.
Not sure of the reason, but it does have to be a problem with the implementation, it might be because of the weights, or some embedding/tokenizer mismatch.
_Originally posted by @alisafaya in https://github.com/huggingface/transformers/issues/29181#issuecomment-1961539845_
About this issue
- Original URL
- State: open
- Created 4 months ago
- Comments: 21 (4 by maintainers)
This is not related to the context size. Perplexity values close to 1.0 means that the loss value is close to 0. I checked the script you shared, and it has small bug.
Converts the whole input into a sequence of:
<bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos><bos>
This is the reason of the very low perplexity. It should be:
The main issue seems to be related to the bos token. I identified two main issues:
I updated the script as follows:
Now the token perplexity:
gemma-7b
=6.1250
gemma-2b
=7.7500
gemma-2b
=8.1250
gemma-7b
=8.0111e+08
This should be added to the documentation or fixed somehow in the configuration files. After that we can close this issue.
No, I do not.
Btw, token perplexity is not directly comparable across models with different tokenizers.
I advise using bits-per-char or negative log likelihood per character. (Sum total loss over the whole test set and averaging per number of characters or bytes.)
For reference check the appendix of the Megatron blog here: https://nv-adlr.github.io/MegatronLM
On Wed, Feb 28, 2024, 21:09 Vincent Nguyen @.***> wrote:
It’s very specific to Gemma and more so to
gemma-7b
. We can have the tokenizer warning users ifbos_token
is not set, otherwise just a tip / warning in thegemma.md
should be good.add_special_tokens=False
is the user disabling somethingGoing from 1e13 to 1 seems pretty good already no?