crewAI: Invalid Format: Missing 'Action:' after 'Thought

My agent keep running into this error whenever I use any of the models locally (I tried llama2, openhermes, starling and Mistral). The only model that didn’t run into this problem is Mistral.

Very often this error is followed by another error: “Error executing tool. Missing exact 3 pipe (|) separated values. For example, coworker|task|information.”

Whenever any of these two errors appeared, I wouldn’t be able to get any valid output. I was experimenting with simple examples like internet scraping with DuckDuckGo and custom reddit scraping tool. Also, worth mentioning, I don’t have these problems when I use openai.

About this issue

  • Original URL
  • State: closed
  • Created 6 months ago
  • Reactions: 3
  • Comments: 24 (1 by maintainers)

Most upvoted comments

I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like

  • Action:
  • Thought:
  • Action Input:

The agent will fail to parse the text.

For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed Action: into **Action:**

To resolve this, I add the following to all of my task prompts:

These keywords must never be translated and transformed:
- Action:
- Thought:
- Action Input:
because they are part of the thinking process instead of the output.

Hey folks, finally catching up to this!

Indeed smaller models do struggle with certain tools, specially more complex ones, I think there is room for us to optimize the crew prompts a bit maybe, I’ll look into that, but in the end of the day smaller models do struggle with cognition.

I’m collecting data to fine tune these models into agentic models that will be trained to behave more like agents, this should provide way more reliability in even small models.

I think a good next action here might be to mention the best models on our new docs, and doing some test on slightly changing the prompts for smaller models, I’ll take a look at that, meanwhile I’m closing this one, but open to re-open if there are requests 😃

I had success running a simple crew with one function. Benchmarks of the different models and if they worked with function calling is below. Hopefully, this helps someone! All testing was done using LM Studio as the API Server.

Model Benchmarks

Hello, I have experienced the same issue with Openhermes before, but since I configured the temperture to 0.1, it works great.

I was having looping problem before as well, but with Gemini Pro, with temperature at 0.6, all issues gone.

I found another cause of this bug. When the task prompt is too strong, it changes some important (internal) keywords, like

  • Action:
  • Thought:
  • Action Input:

The agent will fail to parse the text.

For example, I have a task to ask the agent to use a markdown header to denote headers. It transformed Action: into **Action:**

To resolve this, I add the following to all of my task prompts:

These keywords must never be translated and transformed:
- Action:
- Thought:
- Action Input:
because they are part of the thinking process instead of the output.

@kingychiu could you actually please tell where exactly did you changed the above , In my case model is failed to follow the instructions so it never returns the output into that format which agent tool expects

Hi everyone, thank you all for replying and sharing your experiences. I wanted to share my observations and maybe somebody might find them helpful and save up some time.

Over the last 10 days, I’ve experimented with 15 different models. My laptop has 16 GB RAM, my goal for my agents was to scrape data from a particular subreddit and to turn that data into simple, short newsletter written in layman words.

Of those 15 models, only 2 were able to accomplish the task: GPT4 and Llama 2 13B (base model).

Models I’ve played with that have failed were:

  • Gemini Pro
  • Mistral 7B
  • Mistral 7B instruct
  • phi-2
  • Open Chat 3.5 7B
  • Nous Hermes 7B
  • Open Hermes 2.5 7B
  • Starling 7B
  • Llama 2 13B chat
  • Llama 2 13B text
  • Llama 2 7B
  • Llama 2 7B text
  • Llama 2 7B chat

I have tried to tweak my prompts, I’ve played with modelfile by setting all kinds of parameters, but the only conclusion that I came up with is: more parameters = more reasoning.

The reason why agents failed is because they either:

  1. didn’t understand that they need to use the scraping tool and would instead use their training data to write the newsletter OR
  2. they would scrape the data and instead of writing the newsletter, they would start reacting to the scraped data. e.g. if the scraped data mentions a new python library, agents would totally forget about the newsletter and would try to write a python script

I have one more theory but I can’t test it due to insufficient RAM my laptop. I wonder if models with 7B p but context window of 16K tokens would be able to perform the task. In other words, would bigger context window = more reasoning?

With the @kingychiu hack, I’ve got Error executing tool. Missing exact 3 pipe (|) separated values. I had to add Action Input should be formatted as coworker|task|context.

allow_delegation=True,
llm=Ollama(model="codellama:34b")

@kingychiu that worked. I’m running TheBloke/dolphin-2.2.1-mistral-7B-GGUF on LMStudio.

I’ve modified my model to handle num_ctx=16384 (running on a RTX 3090), no issues since.

Which model are you running?

sorry, I didn’t check my email bc vacations.

OpenHermes.

I tried running a few 13B models - Llama 2 and Vicuna. I assumed that the bigger model = better results but that wasn’t the case. I think that “losing track” is a right way to describe the issue. It looks like local model totally forgets about all the prompts and starts looping.