outlines: Docstring of `RegexLogitsProcessor` is incorrect
Describe the issue as clearly as possible:
Related to #536, the JSONLogitsProcessor requires the same adapt_tokenizer
modifications found here, otherwise it throws this error: AttributeError: 'LLM' object has no attribute 'tokenizer'
.
Steps/code to reproduce the bug:
from vllm import LLM
from outlines.serve.vllm import JSONLogitsProcessor, _patched_apply_logits_processors
from pydantic import BaseModel
from vllm import SamplingParams
import vllm.model_executor.layers.sampler as sampler
sampler._apply_logits_processors = _patched_apply_logits_processors
llm = LLM('mistralai/Mistral-7B-Instruct-v0.1')
class AviationPrompt(BaseModel):
is_aviation_related: bool
logits_processor = JSONLogitsProcessor(AviationPrompt, llm)
results = llm.generate(user_prompts, sampling_params=SamplingParams(max_tokens=100, logits_processors=[logits_processor]))
Expected result:
No errors
Error message:
Traceback (most recent call last):
File "/pkg/modal/_container_entrypoint.py", line 372, in handle_input_exception
yield
File "/pkg/modal/_container_entrypoint.py", line 530, in run_inputs
res = imp_fun.fun(*args, **kwargs)
File "/root/api.py", line 169, in generate
logits_processor = JSONLogitsProcessor(AviationPrompt, self.llm)
File "/usr/local/lib/python3.10/site-packages/outlines/serve/vllm.py", line 122, in __init__
super().__init__(regex_string, llm)
File "/usr/local/lib/python3.10/site-packages/outlines/serve/vllm.py", line 54, in __init__
tokenizer = self.adapt_tokenizer(llm.tokenizer.tokenizer)
AttributeError: 'LLM' object has no attribute 'tokenizer'
Traceback (most recent call last):
File "/pkg/modal/_container_entrypoint.py", line 372, in handle_input_exception
yield
File "/pkg/modal/_container_entrypoint.py", line 530, in run_inputs
res = imp_fun.fun(*args, **kwargs)
File "/root/api.py", line 487, in run_batch_inference
results = list(model.generate.map(question_batches))
File "/pkg/synchronicity/synchronizer.py", line 335, in _run_generator_sync
raise uc_exc.exc from None
File "<ta-Xp14glur4pj88roeBtv1We>:/root/api.py", line 169, in generate
File "<ta-Xp14glur4pj88roeBtv1We>:/usr/local/lib/python3.10/site-packages/outlines/serve/vllm.py", line 122, in __init__
File "<ta-Xp14glur4pj88roeBtv1We>:/usr/local/lib/python3.10/site-packages/outlines/serve/vllm.py", line 54, in __init__
AttributeError: 'LLM' object has no attribute 'tokenizer'
Outlines/Python version information:
Using outlines==0.0.27
and vllm==0.3.0
.
Context for the issue:
I can’t get it to run without explicitly adding the adapt_tokenizer
code.
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 18 (10 by maintainers)
I will modify the code so
RegexLogitsProcessor
accepts avllm.LLM
instance as well, it is less cumbersome for users who want to use the logits processor this way.At the end of the day we cannot correct for some of the models’ quirks like tendency to repeat themselves, unless we strictly forbid it like @lapp0’s PR allows to do. We originally decided to let models choose the number of line breaks and white spaces themselves, because who knows what they’ve seen during training? While this is OK for larger models, smaller models struggle with this so we will add a section in the documentation to warn users and recommend setting
whitespace_pattern
to the empty string.Thank you for taking the time to open an issue and following up!
Glad to hear!
If there’s any speed increase, it’s probably because the generation is shorter. A transformers language models generation time complexity is quadratic with the sequence length. Outlines JSON / Regex should already outpace any model on any GPU 😃
Please continue contributing issues if you run into any problems as you experiment with outlines!
@sethkimmel3 could you please try https://github.com/outlines-dev/outlines/pull/625?
pip install git+https://github.com/lapp0/outlines@allow-single-whitespace-json
then passmultiple_ws=False
toJSONLogitsProcessor
. Let me know if the results are better.That seemed to do the trick. No longer need to call the
adapt_tokenizer
code.