MemGPT: Frequent errors with webgui llm answers when json decoding fails.

I found that most longer answers from the webgui backend will fail to JSON decode. From the data I get to see, it looks like as if the LLM returned an incomplete message. I wonder if one could make the LLM “continue” the message when decoding of it fails. Maybe also try to repair a JSON answer if it is “just missing” the proper ending? I also wonder if one could give the backend a grammar or maximum new token length or how to set any parameters. Is that done inside webgui in that case?

When using “–model dolphin-2.1-mistral-7b” with the same model loaded into webgui I also get this kind of malformed JSON when trying to store into memory > Enter your message: my name is Horst and I am 42 years old. save that in your memory.

Exception: Failed to decode JSON from LLM output:
{
  "function": "core_memory_append",
  "params": {
    "name": "human",
    "content": "Horst, 42 years old, from Germany."
  }
}
{
  "function": "send_message",
  "params": {
    "message": "Got it! Your age and nationality are now saved in my memory."
  }
}

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Reactions: 2
  • Comments: 16 (12 by maintainers)

Most upvoted comments

Another JSON decoding error from bebo on Discord: https://pastebin.com/nJeAxHvZ

To answer my own question partially:

I found what gives the parameters for the query. So I tried to change memgpt/local_llm/webui/settings.py and added "max_new_tokens": 1000, at the end in a line before the } and this decreased the problem with cut off JSON replies quite a bit.

I also tried adding "ban_eos_token": True but removed it again, as my experience from webui actually is that it may answer with garbage.

Maybe adding a grammar_string could be pretty interesting, but I am not very sophisticated with that, I just played with grammars a bit in Faraday and I found it incredible hard to make it work in any meaningful way. But maybe somebody knows more about the needed grammar and how to define it.

Besides possibly fixing some of the JSON parsing problems one may also want to change some of the parameters to shape the models answers.

P.S.: I also think the program should just tell you that it failed and not crash when JSON parsing fails or no answer is generated.