langchain: RetrievalQAWithSourcesChain causing openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens
I want to migrate from VectorDBQAWithSourcesChain
to RetrievalQAWithSourcesChain
. The sample code use Qdrant vector store, it work fine with VectorDBQAWithSourcesChain.
When I run the code with RetrievalQAWithSourcesChain changes, it prompt me the following error:
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4411 tokens (4155 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.
The following is the git diff
of the code:
diff --git a/ask_question.py b/ask_question.py
index eac37ce..e76e7c5 100644
--- a/ask_question.py
+++ b/ask_question.py
@@ -2,7 +2,7 @@ import argparse
import os
from langchain import OpenAI
-from langchain.chains import VectorDBQAWithSourcesChain
+from langchain.chains import RetrievalQAWithSourcesChain
from langchain.vectorstores import Qdrant
from langchain.embeddings import OpenAIEmbeddings
from qdrant_client import QdrantClient
@@ -14,8 +14,7 @@ args = parser.parse_args()
url = os.environ.get("QDRANT_URL")
api_key = os.environ.get("QDRANT_API_KEY")
qdrant = Qdrant(QdrantClient(url=url, api_key=api_key), "docs_flutter_dev", embedding_function=OpenAIEmbeddings().embed_query)
-chain = VectorDBQAWithSourcesChain.from_llm(
- llm=OpenAI(temperature=0, verbose=True), vectorstore=qdrant, verbose=True)
+chain = RetrievalQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type="stuff", retriever=qdrant.as_retriever())
result = chain({"question": args.question})
print(f"Answer: {result['answer']}")
If you need the code of data ingestion (create embeddings), please check it out: https://github.com/limcheekin/flutter-gpt/blob/openai-qdrant/create_embeddings.py
Any idea how to fix it?
Thank you.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 3
- Comments: 33 (2 by maintainers)
@wen020
There seems to be
max_tokens_limit
parameter, but that is not reflected in the documentation.This fixed my issue:
ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(search_kwargs={"k": 1}), max_tokens_limit=4097)
You can try setting
reduce_k_below_max_tokens=True
, it is supposed to limit the number of results to return from store based on tokens limit. But for some reason, it seems that it only works on chat models (GPT-3.5/GPT-4), at least from my testing.@GeneralLHW , I think the error is saying: your context is too long, the context is from the chromadb, so it is the document in the chromadb is too long, I think. You can set verbose=true, to see the complete question and response to OpenAI.
Same error, Any updates here?
The following answer is come from above:
this a good job,but i dont know how to set reduce_k_below_max_tokens=True?Can you give me some examples
I met the same problem when I want to query in some chinese documents which I put them into chromadb
You can see my prompt is very short, but it says: InvalidRequestError: This model’s maximum context length is 4097 tokens, however you requested 6990 tokens (6734 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.