langchain: RetrievalQAWithSourcesChain causing openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens

I want to migrate from VectorDBQAWithSourcesChain to RetrievalQAWithSourcesChain. The sample code use Qdrant vector store, it work fine with VectorDBQAWithSourcesChain.

When I run the code with RetrievalQAWithSourcesChain changes, it prompt me the following error:

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4411 tokens (4155 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

The following is the git diff of the code:

diff --git a/ask_question.py b/ask_question.py
index eac37ce..e76e7c5 100644
--- a/ask_question.py
+++ b/ask_question.py
@@ -2,7 +2,7 @@ import argparse
 import os
 
 from langchain import OpenAI
-from langchain.chains import VectorDBQAWithSourcesChain
+from langchain.chains import RetrievalQAWithSourcesChain
 from langchain.vectorstores import Qdrant
 from langchain.embeddings import OpenAIEmbeddings
 from qdrant_client import QdrantClient
@@ -14,8 +14,7 @@ args = parser.parse_args()
 url = os.environ.get("QDRANT_URL")
 api_key = os.environ.get("QDRANT_API_KEY")
 qdrant = Qdrant(QdrantClient(url=url, api_key=api_key), "docs_flutter_dev", embedding_function=OpenAIEmbeddings().embed_query)
-chain = VectorDBQAWithSourcesChain.from_llm(
-        llm=OpenAI(temperature=0, verbose=True), vectorstore=qdrant, verbose=True)
+chain = RetrievalQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type="stuff", retriever=qdrant.as_retriever())
 result = chain({"question": args.question})
 
 print(f"Answer: {result['answer']}")

If you need the code of data ingestion (create embeddings), please check it out: https://github.com/limcheekin/flutter-gpt/blob/openai-qdrant/create_embeddings.py

Any idea how to fix it?

Thank you.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 3
  • Comments: 33 (2 by maintainers)

Most upvoted comments

@wen020

There seems to be max_tokens_limit parameter, but that is not reflected in the documentation.

This fixed my issue:

ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(search_kwargs={"k": 1}), max_tokens_limit=4097)

You can try setting reduce_k_below_max_tokens=True, it is supposed to limit the number of results to return from store based on tokens limit. But for some reason, it seems that it only works on chat models (GPT-3.5/GPT-4), at least from my testing.

@GeneralLHW , I think the error is saying: your context is too long, the context is from the chromadb, so it is the document in the chromadb is too long, I think. You can set verbose=true, to see the complete question and response to OpenAI.

I met the same problem when I want to query in some chinese documents which I put them into chromadb

embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
docsearch = Chroma.from_documents(texts, embeddings)
retriever = docsearch.as_retriever()
qa = RetrievalQA.from_llm(llm=OpenAI(), retriever=retriever)

query = "what"
qa.run(query)

You can see my prompt is very short, but it says: InvalidRequestError: This model’s maximum context length is 4097 tokens, however you requested 6990 tokens (6734 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

Same error, Any updates here?

The following answer is come from above:

vectorstore.as_retriever(search_kwargs={"k": 1})

You can try setting reduce_k_below_max_tokens=True, it is supposed to limit the number of results to return from store based on tokens limit. But for some reason, it seems that it only works on chat models (GPT-3.5/GPT-4), at least from my testing.

this a good job,but i dont know how to set reduce_k_below_max_tokens=True?Can you give me some examples

I met the same problem when I want to query in some chinese documents which I put them into chromadb

embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
docsearch = Chroma.from_documents(texts, embeddings)
retriever = docsearch.as_retriever()
qa = RetrievalQA.from_llm(llm=OpenAI(), retriever=retriever)

query = "what"
qa.run(query)

You can see my prompt is very short, but it says: InvalidRequestError: This model’s maximum context length is 4097 tokens, however you requested 6990 tokens (6734 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.