langroid: prompt doesn't seem to make it to the llm via an agent or task with Mixtral (MoE) local model
Hey there!
So I’m playing with the example scripts from the docs, specifically the two-agent collaboration example and have run into a problem with Mixtral instruct v1 based models using the Oobabooga text-generation-ui server.
The problem is when the agents are set up, for whatever reason, the prompt doesn’t seem to make it to the LLM.
Here’s how I set up the llm. llm_url is set to an http link to the /v1 endpoint at port 5000 and api_key is set to “sk-111111111111111111111111111111111111111111111111”, which is how Oooba likes it. Then, per the example, I did this:
# create the (Pydantic-derived) config class: Allows setting params via MYLLM_XXX env vars
MyLLMConfig = OpenAIGPTConfig.create(prefix="myllm") #(1)!
# instantiate the class, with the model name and context length
my_llm_config = MyLLMConfig(
api_base=llm_url,
chat_context_length=2048,
api_key=api_key,
litellm = False,
max_output_tokens= 2048,
min_output_tokens = 64,
chat_model="local_mixtral",
completion_model=OpenAICompletionModel.TEXT_DA_VINCI_003, #tried lots of settings here including GPT3, GPT4 Turbo, etc.
timeout=60, # increased this as I was experiencing timeouts dunno why but this fixed it.
seed=random.randint(0,9999999), #tried 42 but wanted to see if this would change anything
cache_config=RedisCacheConfig(fake=True) # get rid of annoying warning
)
At this point, the following works fine:
mdl = OpenAIGPT(my_llm_config)
response = mdl.chat("Is New York in America?", max_tokens=30)
RESPONSE: Yes, New York is a state in the United States of America.
Great. So let’s try it with multiple messages
messages = [
LLMMessage(content="You are a helpful assistant", role=Role.SYSTEM),
LLMMessage(content="Is New York in America?", role=Role.USER),
]
response = mdl.chat(messages, max_tokens=50)
agent_config = ChatAgentConfig(llm=my_llm_config, name="my-llm-agent")
agent = ChatAgent(agent_config)
RESPONSE: Yes, New York is a state in the United States of America.
However, setting it up with an agent, like this:
agent = ChatAgent(agent_config)
response = agent.llm_response("Is New York in America?")
Results in a very long ramble on random topics (how to use python, some long paragraph in french, etc.) that is completely unrelated to the prompt and appears to be what happens when no prompt makes it to the llm. It’s processing a blank prompt, I suspect, and just spewing randomness.
Similarly, trying it with a Task:
agent = ChatAgent(agent_config)
task = Task(agent, system_message="""
You are a helpful assistant
""", single_round=True,)
task.run("Is New York in America?")
This also results in total garbage out.
Again, using a non MoE Mistral appears to work (although it didn’t quite follow the prompts very well, which is why I was hoping mixtral would work better), but it doesn’t seem to receive it through an agent. With Mixtral alone, it’s prompt in, but garbage out.
Anyone else experiencing this?
Without examining the code in too much detail, I wonder why would the prompt make it to the LLM directly but not via an agent? Does this maybe have something to do with the instruction template setting or something?
I tried playing with various settings in the MyLLMConfig, some of which you can see above, but nothing seemed to work. Also tried changing instruction templates on Oobabooga itself, but no dice. I also tried moving the prompts from system_message to user_message, from the task to the agent… but it wouldn’t “take”.
Any thoughts? Why would using an agent “block” the prompt? 🤔
Using langroid v0.1.157 w/litellm FWIW.
Thanks - this looks like a fun and interesting project!
About this issue
- Original URL
- State: closed
- Created 6 months ago
- Comments: 17 (9 by maintainers)
@fat-tire @tozimaru Thank you for all the feedback. I will take all of this into account and rationalize some of the local-model setups and write an updated doc page on that. And for task-orchestration, I am planning to write up a detailed doc that presents a mental model that people should have when designing multi-agent workflows, and what the various task settings mean – even I run into issues setting these up!
Meanwhile I will point to a couple places that may be helpful, specifically for multi-agent task workflow design:
test_task.py. Note that this and any other test can be run usingpytestwith an optional--marg for the local (or hosted) model, e.g.This arg globally overrides the
chat_modelsetting, so you can easily run any test against models other than the default GPT4. (The-sshows output, and-xquits on the first test failure).test_lance_doc_chat_agent.pyJust to chime in, I too would like to express how fun it feels using this project to tinker with agents. I’d also appreciate a better explanation of
done_if_responseanddone_if_no_response. Currently my biggest challenge using the repo is my agents not knowing when a certain task is DONE when usingRecipientToolin combination with several other tools.Ah yes @nilspalumbo is working on an async task spawning, glad to see interest in that
Thank you for the interest! I’m thinking of putting down a definitive “Laws of Langroid” doc, stay tuned – it will address what is a step, what is a valid response, when is a task done, what is the result of a task, when is a responder eligible to respond, etc. All of these are in the code but there’s a real need to bring it out conceptually, and also show diagrammatically how each step evolves.
Langroid doesn’t have these; it simply assumes the endpoint is OpenAI-compatible and that the chat-formatting is handled by the endpoint.
Okay, update!
As a test, I’m using this model: https://huggingface.co/TheBloke/Starling-LM-alpha-8x7B-MoE-GGUF – it’s based on Mistral’s MoE model.
Here’s my simplified llmconfig, which uses the “local/#.#.#.#:5000/v2” formulation as you recommended, and is assigned this time to
chat_modelrather thanapi_base. (I previously usedapi_baseinstead because it successfully connected locally and didn’t try OpenAI’s servers. I didn’t realize that I could dolocal/#.#.#.#..., but it does seem that the “local/” is pretty important)So now the agent responds correctly with:
It responds correctly that yes, new york is a state.
Unfortunately, when I try the two-agent chat (adding the numbers together), i’m getting some weird timeout issues, but it does appear to work eventually, and I see some communication between agents now. It still isn’t following the prompts perfectly. But at least it sees them! 😄
Thanks for the help!
A couple quick thoughts/suggestions:
Maybe offer an example in the docs (and in the example code) where a non OpenAI server is being accessed by IP or a domain name instead of only “localhost” just to demonstrate the difference between “local/” and “localhost” and to show how how non-OpenAI addresses are also be prepended with “local/” That is, I think I confused “local/” as synonymous with “localhost”, meaning I thought “local” told langroid the model was “on the same machine”, when really it signifies “not openai’s official server”-- eg, I presume you could do “local/ServerSomewhereFarAway.com:5000/v1” so long as it’s compatible with openai’s API, right? (btw- would this handle SSL okay?)
To that end, maybe “local” isn’t the right word, since in future people may be connected to anywhere for LLM services- it could be a local open source model, sure, but it could be somewhere else, right? Perhaps something like “
private/” or “hosted/” or “external/” etc? Specifically referencinglitellmmight be too narrow since it should be compatible with lots of services, right? (Or maybe you treat OpenAI as a special case where tools plugins or whatever it’s called become available automatically if it recognizes an OpenAI model, in which case do you need the “local/” at all? Ideally it’s totally agnostic to the LLM endpoint and works the same with providers be they Mistral, Gemini, Cohere, etc. — although OpenAI’s API is the primary citizen here and so far everyone needs to conform to them)It was not immediately clear in the docs that “DO-NOT-KNOW” has a special NO_ANSWER meaning/function and is treated differently than other responses until I got to the three-agent example, which comes well after the delegate concept has been introduced earlier. I think it may be clearer to explain how a delegate agent decides when it’s done and when to move on, what a pass is, etc.
Related: The
single_roundand thellm_delegateconfigs seem to have been deprecated and replaced withdone_if_response, anddone_if_no_responseinstead. I didn’t see docs on those options and i’m not quite clear conceptually on how they are drop-in replacements, especially as they’ve gone from aboolto something of the formdone_if_response=[Entity.LLM]. What’sEntity.LLMand what does this do? And what are the various options for the config? I’m sure it was deprecated for a great reason, but could the documentation be updated to help frame the new way of thinking of agent behavior? DoesDO-NOT-KNOWconstitute a response?I had to add
use_functions_api=False,to the ChatAgent’s config to avoid the warning: “You have enableduse_functions_apibut the LLM does not support it. So we will enableuse_toolsinstead, so we can use Langroid’s ToolMessage mechanism.”. Maybe either suggest to do this to make the warning go away, or just silently switch to use_tools in the case of a “local” LLM, since openai’s stuff will never be available there.Similarly, when using a “local” model, I was getting a warning about using fakereddis until I put
cache_config=RedisCacheConfig(fake=True)in the config. Similar to the above this warning include the “solution” if we’re fine with the default behavior?Again, I just want to stress how absolutely cool and fun this project is-- I can easily see a future of pre-written agents and tasks that you can download and snap together to do all kinds of cool tasks. A modular node-based graphical system a la Blender or invokeai or comfyui to follow? heh.