transformers: `stopping_criteria` not working with llama
System Info
I am generating text from llama-13b model. But it continues generating even though it met stopping criteria. the stopping criteria works fine with other models such as GPT-J 6B.
I loaded llama-13b by
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto', load_in_8bit=True)
and my stopping criteria list looks like below
stopping_criteria_list = transformers.StoppingCriteriaList([
_SentinelTokenStoppingCriteria(
sentinel_token_ids=tokenizer(
"\n",
add_special_tokens=False,
return_tensors="pt",
).input_ids.to("cuda"),
starting_idx=tokenized_items.input_ids.shape[-1])
])
Thank you.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
- load lama
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto', load_in_8bit=True) - make stopping criteria
stopping_criteria_list = transformers.StoppingCriteriaList([
_SentinelTokenStoppingCriteria(
sentinel_token_ids=tokenizer(
"\n",
add_special_tokens=False,
return_tensors="pt",
).input_ids.to("cuda"),
starting_idx=tokenized_items.input_ids.shape[-1])
])
...
class _SentinelTokenStoppingCriteria(transformers.StoppingCriteria):
def __init__(self, sentinel_token_ids: torch.LongTensor,
starting_idx: int):
transformers.StoppingCriteria.__init__(self)
self.sentinel_token_ids = sentinel_token_ids
self.starting_idx = starting_idx
def __call__(self, input_ids: torch.LongTensor,
_scores: torch.FloatTensor) -> bool:
for sample in input_ids:
trimmed_sample = sample[self.starting_idx:]
# Can't unfold, output is still too tiny. Skip.
if trimmed_sample.shape[-1] < self.sentinel_token_ids.shape[-1]:
continue
for window in trimmed_sample.unfold(
0, self.sentinel_token_ids.shape[-1], 1):
if torch.all(torch.eq(self.sentinel_token_ids, window)):
return True
return False
- generate
model_output = model.generate(stopping_criteria=stopping_criteria_list,
**tokenized_items, **generation_settings,
pad_token_id=tokenizer.eos_token_id)
Expected behavior
Stop generating when it generated \n.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 23 (7 by maintainers)
Hey @poohzaza166 👋
I had a look at your snippet, and the problem does not step from the stopping criteria nor the llama model itself, but rather from how the tokenizer works. It also doesn’t seem to be a bug. My recommendation would be to design the stopping criteria from the token ids, and not from raw text 😃
See this example:
Click me
@oobabooga Those issues will be fixed by #22402
I can reproduce the issue. Here is some additional code for testing:
There is always an extra space (29871) everywhere. Also,
If you encode a space, it becomes id 259 instead of 29871. And if you decode [259], the result is two spaces.
Very confusing behavior overall.