langchain: Issue: openai functions agent does not respect tools and arguments

Issue you’d like to raise.

When mixing gpt-3.5-turbo-0613, openai-functions agent, and PythonAstREPLTool tool, GPT3.5 stops respecting the tool name and the arguments hack introduced in the OpenAIFunctionsAgent.

The error log is:

Could not parse tool input: {'name': 'python', 'arguments': "len(cases_df['case_id'].unique())"} because the `arguments` is not valid JSON.

Which means the model isn’t respecting the specs accurately. In my case, the confusion was always that the name of the tool is python instead of python_repl_ast, and the arguments is the actual python code instead of the requested obj format with __arg1 attr.

Suggestion:

I temporarily fixed it by 1- extending the OpenAIFunctionsAgent and overriding the _parse_ai_message to handle arguments confusion. 2- extending the PythonAstREPLTool and altering its name and description a bit.

class CustomPythonAstREPLTool(PythonAstREPLTool):
    name = "python"
    description = (
        "A Python shell. Use this to execute python commands. "
        "The input must be an object as follows: "
        "{'__arg1': 'a valid python command.'} "
        "When using this tool, sometimes output is abbreviated - "
        "Make sure it does not look abbreviated before using it in your answer. "
        "Don't add comments to your python code."
    )

def _parse_ai_message(message: BaseMessage) -> Union[AgentAction, AgentFinish]:
    """Parse an AI message."""
    if not isinstance(message, AIMessage):
        raise TypeError(f"Expected an AI message got {type(message)}")

    function_call = message.additional_kwargs.get("function_call", {})

    if function_call:
        function_call = message.additional_kwargs["function_call"]
        function_name = function_call["name"]
        try:
            _tool_input = json.loads(function_call["arguments"])
        except JSONDecodeError:
            print(
                f"Could not parse tool input: {function_call} because "
                f"the `arguments` is not valid JSON."
            )
            _tool_input = function_call["arguments"]

        # HACK HACK HACK:
        # The code that encodes tool input into Open AI uses a special variable
        # name called `__arg1` to handle old style tools that do not expose a
        # schema and expect a single string argument as an input.
        # We unpack the argument here if it exists.
        # Open AI does not support passing in a JSON array as an argument.
        if "__arg1" in _tool_input:
            tool_input = _tool_input["__arg1"]
        else:
            tool_input = _tool_input

        content_msg = "responded: {content}\n" if message.content else "\n"

        return _FunctionsAgentAction(
            tool=function_name,
            tool_input=tool_input,
            log=f"\nInvoking: `{function_name}` with `{tool_input}`\n{content_msg}\n",
            message_log=[message],
        )

    return AgentFinish(return_values={"output": message.content}, log=message.content)

class CustomOpenAIFunctionsAgent(OpenAIFunctionsAgent):
    def plan(
        self,
        intermediate_steps: List[Tuple[AgentAction, str]],
        callbacks: Callbacks = None,
        **kwargs: Any,
    ) -> Union[AgentAction, AgentFinish]:
        """Given input, decided what to do.
        Args:
            intermediate_steps: Steps the LLM has taken to date, along with observations
            **kwargs: User inputs.
        Returns:
            Action specifying what tool to use.
        """
        user_input = kwargs["input"]
        agent_scratchpad = _format_intermediate_steps(intermediate_steps)
        prompt = self.prompt.format_prompt(
            input=user_input, agent_scratchpad=agent_scratchpad
        )
        messages = prompt.to_messages()
        predicted_message = self.llm.predict_messages(
            messages, functions=self.functions, callbacks=callbacks
        )
        agent_decision = _parse_ai_message(predicted_message)
        return agent_decision

Not sure if this will be improved on the API level, but it is worth looking at it. Improving the fake arguments’ names and tools names might improve this as it seems related to the issue.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 7
  • Comments: 21 (8 by maintainers)

Most upvoted comments

hey @li-xiaohui

I had to initialize the agent manually, here is how.

dataset={"df": df}
tools = [CustomPythonAstREPLTool(locals=dataset)]
tool_names = [tool.name for tool in tools]
prompt = CustomOpenAIFunctionsAgent.create_prompt(system_message=SystemMessage(content=prefix))
agent = AgentExecutor.from_agent_and_tools(
    agent=CustomOpenAIFunctionsAgent(llm=llm, prompt=prompt, tools=tools, verbose=True),
    tools=tools, verbose=True
)

@falmanna, what are the relevant imports for CustomPythonAstREPLTool and CustomOpenAIFunctionsAgent?

from langchain.callbacks import StdOutCallbackHandler
from langchain.schema import LLMResult
from langchain.tools.python.tool import PythonAstREPLTool
from pandasql import sqldf
from langchain.agents import Tool
from langchain.agents.openai_functions_agent.base import (
    OpenAIFunctionsAgent, 
    _format_intermediate_steps, 
    _FunctionsAgentAction
)
from langchain.schema import (
    AgentAction,
    AgentFinish,
    AIMessage,
    BaseMessage,
)
from langchain.callbacks.manager import Callbacks
from typing import Any, List, Tuple, Union, Dict
from json import JSONDecodeError

It’s a general performance issue with the model - it hallucinates and ignores instructions like enums. Using natural language or sudocode prompts to return JSON gives consistently better results. The solution is not to use functions.

I also encountered a similar problem (when using gpt-3.5-turbo), but this issue disappeared when I switched to using gpt4.

It also works better when you add arg_schema Field to PythonAstREPLTool. Code:

class AstArgSchema(BaseModel):
    """A schema for the ast argument."""
    query: str = Field(description="A string formatted plain python script with imports and variables to execute.")

class PythonAstREPLTool(BaseTool):
    """A tool for running python code in a REPL."""

    name = "python_repl_ast"
    description = (
        "A Python shell. Use this to execute python commands. "
        "Input should be a valid python command. "
        "When using this tool, sometimes output is abbreviated - "
        "make sure it does not look abbreviated before using it in your answer."
    )
    globals: Optional[Dict] = Field(default_factory=dict)
    locals: Optional[Dict] = Field(default_factory=dict)
    sanitize_input: bool = True
    args_schema: Type[BaseModel] = AstArgSchema
    ...

It’s a general performance issue with the model - it hallucinates and ignores instructions like enums. Using natural language or sudocode prompts to return JSON gives consistently better results. The solution is not to use functions.

This is really interesting because it is supposed to be the other way. This model was fine-tuned on returning the correct requested json, so its error rate should be lower than using just a prompt. I don’t know what they are doing internally, but I think they are converting the functions list piece to a prompt in the same format they used to fine-tune the model, so it is really weird that we might get better results by crafting our own prompt instead of using functions.

This somehow makes sense as I think the model cares about the functions and the parameter names, hence the confusion it is making when calling back the python tool. I believe that improving this piece in langcahin might yield better results as I didn’t have any issues since I implemented the changes above.