outlines: Docstring of `RegexLogitsProcessor` is incorrect

Describe the issue as clearly as possible:

Related to #536, the JSONLogitsProcessor requires the same adapt_tokenizer modifications found here, otherwise it throws this error: AttributeError: 'LLM' object has no attribute 'tokenizer'.

Steps/code to reproduce the bug:

from vllm import LLM
from outlines.serve.vllm import JSONLogitsProcessor, _patched_apply_logits_processors
from pydantic import BaseModel
from vllm import SamplingParams
import vllm.model_executor.layers.sampler as sampler

sampler._apply_logits_processors = _patched_apply_logits_processors

llm = LLM('mistralai/Mistral-7B-Instruct-v0.1')

class AviationPrompt(BaseModel):
    is_aviation_related: bool

logits_processor = JSONLogitsProcessor(AviationPrompt, llm)
results = llm.generate(user_prompts, sampling_params=SamplingParams(max_tokens=100, logits_processors=[logits_processor]))

Expected result:

No errors

Error message:

Traceback (most recent call last):
  File "/pkg/modal/_container_entrypoint.py", line 372, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 530, in run_inputs
    res = imp_fun.fun(*args, **kwargs)
  File "/root/api.py", line 169, in generate
    logits_processor = JSONLogitsProcessor(AviationPrompt, self.llm)
  File "/usr/local/lib/python3.10/site-packages/outlines/serve/vllm.py", line 122, in __init__
    super().__init__(regex_string, llm)
  File "/usr/local/lib/python3.10/site-packages/outlines/serve/vllm.py", line 54, in __init__
    tokenizer = self.adapt_tokenizer(llm.tokenizer.tokenizer)
AttributeError: 'LLM' object has no attribute 'tokenizer'
Traceback (most recent call last):
  File "/pkg/modal/_container_entrypoint.py", line 372, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 530, in run_inputs
    res = imp_fun.fun(*args, **kwargs)
  File "/root/api.py", line 487, in run_batch_inference
    results = list(model.generate.map(question_batches))
  File "/pkg/synchronicity/synchronizer.py", line 335, in _run_generator_sync
    raise uc_exc.exc from None
  File "<ta-Xp14glur4pj88roeBtv1We>:/root/api.py", line 169, in generate
  File "<ta-Xp14glur4pj88roeBtv1We>:/usr/local/lib/python3.10/site-packages/outlines/serve/vllm.py", line 122, in __init__
  File "<ta-Xp14glur4pj88roeBtv1We>:/usr/local/lib/python3.10/site-packages/outlines/serve/vllm.py", line 54, in __init__
AttributeError: 'LLM' object has no attribute 'tokenizer'

Outlines/Python version information:

Using outlines==0.0.27 and vllm==0.3.0.

Context for the issue:

I can’t get it to run without explicitly adding the adapt_tokenizer code.

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Comments: 18 (10 by maintainers)

Commits related to this issue

Most upvoted comments

I will modify the code so RegexLogitsProcessor accepts a vllm.LLM instance as well, it is less cumbersome for users who want to use the logits processor this way.

At the end of the day we cannot correct for some of the models’ quirks like tendency to repeat themselves, unless we strictly forbid it like @lapp0’s PR allows to do. We originally decided to let models choose the number of line breaks and white spaces themselves, because who knows what they’ve seen during training? While this is OK for larger models, smaller models struggle with this so we will add a section in the documentation to warn users and recommend setting whitespace_pattern to the empty string.

Thank you for taking the time to open an issue and following up!

Glad to hear!

If there’s any speed increase, it’s probably because the generation is shorter. A transformers language models generation time complexity is quadratic with the sequence length. Outlines JSON / Regex should already outpace any model on any GPU 😃

Please continue contributing issues if you run into any problems as you experiment with outlines!

@sethkimmel3 could you please try https://github.com/outlines-dev/outlines/pull/625?

pip install git+https://github.com/lapp0/outlines@allow-single-whitespace-json then pass multiple_ws=False to JSONLogitsProcessor. Let me know if the results are better.

That seemed to do the trick. No longer need to call the adapt_tokenizer code.