langchain: Issue: Token Limit Exceeded Error in SQL Database Agent

Issue you’d like to raise.

I am trying to work with Snowflake using create_sql_agent Very often getting token limit error.

This is my code

from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain.llms.openai import OpenAI
from langchain.agents import AgentExecutor

from sqlalchemy.dialects import registry
registry.load("snowflake")

account_identifier = 'xxxx'
user = 'xxxx'
password = 'xxxx'
database_name = 'xxxx'
schema_name = 'xxxx'
warehouse_name = 'xxxx'
role_name = 'xxxx'

conn_string = f"snowflake://{user}:{password}@{account_identifier}/{database_name}/{schema_name}?warehouse={warehouse_name}&role={role_name}"
db = SQLDatabase.from_uri(conn_string)
print("DB===", db)
toolkit = SQLDatabaseToolkit(llm=OpenAI(temperature=0), db=db)

agent_executor = create_sql_agent(
    llm=OpenAI(temperature=0),
    toolkit=toolkit,
    verbose=True
)

agent_executor.run("Which companies are getting the most reviews in a specific category?")

If I ask straightforward question on a tiny table that has only 5 records, Then the agent is running well. If the table is slightly bigger with complex question, It throws InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 13719 tokens (13463 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

Suggestion:

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (1 by maintainers)

Most upvoted comments

Am I missing something or do these limits make this functionally useless for most real world case? Unless you’re looking at a very narrow focus for questions or have a tiny database schema then you’re going to run into this limit. I’ve just been trying to get this working with the Microsoft sample database AdventureWorks which is not particularly big in terms of schema, certainly way smaller than any of our product db schemas, and even that is far too large for the 4K limit on the model I’m using. Switching to a model with a 16K limit would work in this particular example but it’s still going to be too small for any reasonably sized schema. Unless you’ve got a narrow focus for questions you can’t limit tables/columns as you don’t know what’s going to be asked. If you do know what is going to be asked then why even bother with AI?

I’m not trying to be antagonistic with this question - I’m genuinely curious what the application of this tool is with these limitations?

I am also facing the same issue

The agent has to query the list of tables on the database, then the schema for the most relevant tables. If you have lots of relevant sounding tables, or big schemas for some tables, you hit the limit of the prompt length.

Immediate options are:

  • decrease the columns on your tables
  • try to get access to ChatGPT4 with 8k prompts instead of 4k.
  • create a view onto your table with just a few columns, then maybe adapt the langchain code if necessary to make the agent consider the view rather than the table.
  • Give up using langchain and use ChatGPT as a few-shot learner : give it example(s) of SQL queries that do work on your tables to answer various natural language queries, and then see whether it does a better job of building new SQL.

I have the same issue. I have tables that have 100+ columns and the prompts being generated are just too large.

I might be crazy, but I am thinking about putting a vector db in front of this and then having the query first pull out the columns it thinks are relevant before building up the prompt.

There are a couple of parameters you can adjust to limit the table information that’s going to be included in the prompt.

SQLDatabase.from_uri(
    conn_string,
    sample_rows_in_table_info=1, # default 3
    include_tables=["my_table_1"],  # will restrict to just using my_table_1
)

You can also use ignore_tables to limit the tables to include in the table info.