langchain: Issue: Abstract Class Implementation problem in Retrievers

Issue you’d like to raise.

Hi everyone,

I’ve encountered an issue while trying to instantiate the ConversationalRetrievalChain in the Langchain library. It seems to be related to the abstract class BaseRetriever and the required method _get_relevant_documents.

Here’s my base code:

import os
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain.embeddings.openai import OpenAIEmbeddings
from dotenv import load_dotenv
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

# Load environment variables
load_dotenv("info.env")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Initialize the FAISS vector database
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
index_filenames = ["index1", "index2"]
dbs = {f"category{i+1}": FAISS.load_local(filename, embeddings) for i, filename in enumerate(index_filenames)}

# Load chat model and question answering chain
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, openai_api_key=OPENAI_API_KEY)
chain = load_qa_chain(llm, chain_type="map_reduce")

# Initialize conversation memory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# Set the role of the AI
role = "Assistant"

# Write the role of the AI
memory.save_context({"input": f"AI: As an {role}, I can help you with your questions based on the retrievers and my own data."}, {"output": ""})

# Initialize the ConversationalRetrievalChain with memory and custom retrievers
qa = ConversationalRetrievalChain.from_llm(llm, retriever=dbs, memory=memory)

# Pass in chat history using the memory
while True:
    user_input = input("User: ")

    # Add user input to the conversation history
    memory.save_context({"input": f"User: {user_input}"}, {"output": ""})

    # Check if the user wants to exit
    if user_input == "!exit":
        break

    # Run the chain on the user's query
    response = qa.run(user_input)

    # Print the response with a one-line space and role indication
    print("\n" + f"{role}:\n{response}" + "\n")

And here’s the error message I received:

pydantic.v1.error_wrappers.ValidationError: 1 validation error for ConversationalRetrievalChain
retriever
  Can't instantiate abstract class BaseRetriever with abstract method _get_relevant_documents (type=type_error)

It appears that the retriever I provided doesn’t fully implement the required methods. Could someone provide guidance on how to properly implement a retriever for ConversationalRetrievalChain and help update the code based on that?

Any help would be greatly appreciated. Thank you!

Suggestion:

No response

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Reactions: 1
  • Comments: 21 (3 by maintainers)

Most upvoted comments

🤖

Hello,

The error message you’re seeing is due to the fact that the BaseRetriever class in LangChain is an abstract base class, and it requires any subclasses to implement the _get_relevant_documents method. This method is designed to retrieve documents that are relevant to a given query.

Based on the code you’ve shared, it seems you’re trying to use a dictionary of FAISS instances as the retriever for the ConversationalRetrievalChain. However, these FAISS instances are not subclasses of BaseRetriever and do not implement the _get_relevant_documents method, which is why you’re seeing this error.

To resolve this issue, you need to create a custom retriever that is a subclass of BaseRetriever and implements the _get_relevant_documents method. Here’s an example of how you can do this:

from langchain.schema.retriever import BaseRetriever, Document
from typing import List

class CustomRetriever(BaseRetriever):
    def __init__(self, dbs):
        self.dbs = dbs

    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    ) -> List[Document]:
        # Use your existing retriever to get the documents
        documents = []
        for db in self.dbs.values():
            documents.extend(db.search(query))
        
        # Sort the documents by "source"
        documents = sorted(documents, key=lambda doc: doc.metadata.get('source'))
        
        return documents

In this example, CustomRetriever is a subclass of BaseRetriever and it provides an implementation for the _get_relevant_documents method. This method uses the FAISS instances in the dbs dictionary to retrieve the relevant documents based on the provided query.

You can then use this custom retriever in your ConversationalRetrievalChain like this:

# Initialize the ConversationalRetrievalChain with memory and custom retrievers
qa = ConversationalRetrievalChain.from_llm(llm, retriever=CustomRetriever(dbs), memory=memory)

Please note that this is just a basic example and you may need to adjust the _get_relevant_documents method to suit your specific needs.

I hope this helps! If you have any further questions, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

🤖

Hello,

From the error message you provided, it seems like you’re trying to initialize the MultiRetrievalQAChain class directly with the retriever_infos parameter. However, the MultiRetrievalQAChain class doesn’t have a retriever_infos parameter in its constructor. Instead, it has a class method called from_retrievers that takes in a list of retrievers and an instance of BaseLanguageModel.

Here’s how you can use it:

from langchain.chains.router import MultiRetrievalQAChain
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import TextLoader
from langchain.vectorstores import FAISS

# Load and split documents
sou_docs = TextLoader('../../state_of_the_union.txt').load_and_split()
pg_docs = TextLoader('../../paul_graham_essay.txt').load_and_split()

# Create retrievers
sou_retriever = FAISS.from_documents(sou_docs, OpenAIEmbeddings()).as_retriever()
pg_retriever = FAISS.from_documents(pg_docs, OpenAIEmbeddings()).as_retriever()

# List of retriever information
retriever_infos = [
    {
        "name": "state of the union", 
        "description": "Good for answering questions about the 2023 State of the Union address", 
        "retriever": sou_retriever
    },
    {
        "name": "pg essay", 
        "description": "Good for answering questions about Paul Graham's essay on his career",
        "retriever": pg_retriever
    }
]

# Create MultiRetrievalQAChain
chain = MultiRetrievalQAChain.from_retrievers(OpenAI(), retriever_infos, verbose=True)

This will create an instance of MultiRetrievalQAChain that uses the provided retrievers. The LLMRouterChain will decide which retriever to use based on the input. If the router doesn’t map the input to one of the retrievers, a ConversationChain with ChatOpenAI as the language model will be used as the default chain.

As for the ValueError you’re getting when running the chain, it’s because the run method of the MultiRetrievalQAChain class only supports one positional argument. You’re passing in two arguments: user_input and memory. If memory is a keyword argument, you should pass it as such:

response = chain.run(user_input, memory=memory)

I hope this helps! Let me know if you have any other questions.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

🤖

Hello,

From the error message you provided, it seems like you’re trying to initialize the MultiRetrievalQAChain class directly with the retriever_infos parameter. However, the MultiRetrievalQAChain class doesn’t have a retriever_infos parameter in its constructor. Instead, it has router_chain, destination_chains, and default_chain parameters, which is why you’re getting a ValidationError from Pydantic.

To correctly initialize the MultiRetrievalQAChain class with a custom retriever, you should use the from_retrievers class method. This method requires a BaseLanguageModel instance, a list of dictionaries containing retriever information, and optionally a default retriever, default prompt, and default chain.

Each dictionary in the retriever_infos list should contain the following keys:

  • name: The name of the retriever.
  • description: A description of the retriever.
  • retriever: An instance of the retriever.
  • prompt (optional): A PromptTemplate instance for the retriever.

Here is an example of how to initialize the MultiRetrievalQAChain:

from langchain.chains.router import MultiRetrievalQAChain
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import TextLoader
from langchain.vectorstores import FAISS

sou_docs = TextLoader('../../state_of_the_union.txt').load_and_split()
sou_retriever = FAISS.from_documents(sou_docs, OpenAIEmbeddings()).as_retriever()

pg_docs = TextLoader('../../paul_graham_essay.txt').load_and_split()
pg_retriever = FAISS.from_documents(pg_docs, OpenAIEmbeddings()).as_retriever()

personal_texts = [
    "I love apple pie",
    "My favorite color is fuchsia",
    "My dream is to become a professional dancer",
    "I broke my arm when I was 12",
    "My parents are from Peru",
]
personal_retriever = FAISS.from_texts(personal_texts, OpenAIEmbeddings()).as_retriever()

retriever_infos = [
    {
        "name": "state of the union", 
        "description": "Good for answering questions about the 2023 State of the Union address", 
        "retriever": sou_retriever
    },
    {
        "name": "pg essay", 
        "description": "Good for answering questions about Paul Graham's essay on his career",
        "retriever": pg_retriever
    },
    {
        "name": "personal", 
        "description": "Good for answering questions about me", 
        "retriever": personal_retriever
    }
]

chain = MultiRetrievalQAChain.from_retrievers(OpenAI(), retriever_infos, verbose=True)

This will create a MultiRetrievalQAChain instance that uses the provided language model and retrievers. The chain will route inputs to the appropriate retriever based on the output of the language model.

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

🤖

Hello,

From the error message you provided, it seems like you’re trying to initialize the MultiRetrievalQAChain class directly with the retriever_infos parameter. However, the MultiRetrievalQAChain class doesn’t have a retriever_infos parameter in its constructor. Instead, it has router_chain, destination_chains, and default_chain parameters, which is why you’re getting a ValidationError from Pydantic.

To correctly initialize the MultiRetrievalQAChain class with a custom retriever, you should use the from_retrievers class method. This method requires a BaseLanguageModel instance, a list of dictionaries containing retriever information, and optionally a default retriever, default prompt, and default chain.

Here is an example of how to initialize the MultiRetrievalQAChain:

from langchain.chains.router import MultiRetrievalQAChain
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import TextLoader
from langchain.vectorstores import FAISS

# Load your documents
sou_docs = TextLoader('../../state_of_the_union.txt').load_and_split()
sou_retriever = FAISS.from_documents(sou_docs, OpenAIEmbeddings()).as_retriever()

pg_docs = TextLoader('../../paul_graham_essay.txt').load_and_split()
pg_retriever = FAISS.from_documents(pg_docs, OpenAIEmbeddings()).as_retriever()

personal_texts = [
    "I love apple pie",
    "My favorite color is fuchsia",
    "My dream is to become a professional dancer",
    "I broke my arm when I was 12",
    "My parents are from Peru",
]
personal_retriever = FAISS.from_texts(personal_texts, OpenAIEmbeddings()).as_retriever()

# Define your retriever information
retriever_infos = [
    {
        "name": "state of the union", 
        "description": "Good for answering questions about the 2023 State of the Union address", 
        "retriever": sou_retriever
    },
    {
        "name": "pg essay", 
        "description": "Good for answering questions about Paul Graham's essay on his career",
        "retriever": pg_retriever
    },
    {
        "name": "personal", 
        "description": "Good for answering questions about me", 
        "retriever": personal_retriever
    }
]

# Initialize the MultiRetrievalQAChain
chain = MultiRetrievalQAChain.from_retrievers(OpenAI(), retriever_infos, verbose=True)

This will create a MultiRetrievalQAChain instance that uses the provided language model and retrievers. The chain will route inputs to the appropriate retriever based on the output of the language model.

You can find more information about the MultiRetrievalQAChain class in the LangChain documentation and the source code.

I hope this helps! Let me know if you have any other questions.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

🤖

Hello,

The error message you’re seeing is due to the Pydantic model validation failing for the CustomRetriever class. Pydantic is a data validation library that LangChain uses to ensure that the data being passed into classes and functions matches the expected types and structures.

In your CustomRetriever class, you’ve defined retriever_instances as a class variable, but it seems like Pydantic is expecting it to be an instance variable. This is likely because Pydantic validates instance variables, not class variables.

To fix this, you should move the retriever_instances variable into the __init__ method of your CustomRetriever class, like so:

class CustomRetriever(BaseRetriever):
    def __init__(self, retriever_instances: dict):
        super().__init__()
        self.retriever_instances = retriever_instances

This should resolve the ValidationError you’re seeing.

However, I noticed that you’re trying to use multiple FAISS retrievers with the ConversationalRetrievalChain. Based on the context provided, it seems like the ConversationalRetrievalChain class doesn’t directly support multiple retrievers. Instead, you might want to use the MultiRetrievalQAChain class, which is designed to handle multiple retrievers.

The MultiRetrievalQAChain class uses a retriever_infos argument, which is a list of dictionaries. Each dictionary should contain information about a retriever, including the retriever instance itself. Here’s an example of how you might structure this argument:

retriever_infos = [
    {"name": "retriever1", "description": "description1", "retriever": retriever1, "prompt": prompt1},
    {"name": "retriever2", "description": "description2", "retriever": retriever2, "prompt": prompt2},
    # Add more retrievers as needed
]

In this example, retriever1 and retriever2 would be your FAISS retriever instances, and prompt1 and prompt2 would be the corresponding prompts for each retriever.

I hope this helps! Let me know if you have any other questions.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

🤖

Hello,

The error you’re encountering is indeed due to the fact that the BaseRetriever class in LangChain is an abstract base class, and it requires any subclasses to implement the _get_relevant_documents method.

To resolve this, you need to create a custom retriever that is a subclass of BaseRetriever and implements the _get_relevant_documents method. Here is an example of how you can create a custom retriever:

from langchain.schema.retriever import BaseRetriever, Document
from typing import List

class CustomRetriever(BaseRetriever):
    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    ) -> List[Document]:
        # Use your existing retriever to get the documents
        documents = self.retriever.get_relevant_documents(query, callbacks=run_manager.get_child())
        
        # Sort the documents by "source"
        documents = sorted(documents, key=lambda doc: doc.metadata.get('source'))
        
        return documents

In this code, the CustomRetriever class inherits from BaseRetriever and overrides the _get_relevant_documents method. This method uses your existing retriever to get the documents and then sorts them by the “source” attribute in the metadata.

You can then use this custom retriever in your RetrievalQA chain like this:

qa = RetrievalQA.from_chain_type(llm=chat, 
                                 chain_type="stuff",
                                 retriever=CustomRetriever(docsearch.as_retriever(search_kwargs={"k": 2})), 
                                 chain_type_kwargs={"prompt": PROMPT,"memory":memory })

Please note that the get_relevant_documents and aget_relevant_documents methods in the BaseRetriever class are now deprecated and the _get_relevant_documents and _aget_relevant_documents methods should be implemented instead. These methods also take an additional run_manager argument.

For more details, you can refer to these similar solved issues in the LangChain repository:

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Just do response = chain.run(user_input). Internally, the multiretrievalqa chain uses a conversation chain and maintains memory of the conversation.

On Thu, Oct 5, 2023 at 1:25 PM murasz @.***> wrote:

@dosu-bot https://github.com/dosu-bot Please check my response for errors. @murasz https://github.com/murasz Sorry, I don’t understand the need for a CustomRetriever. I think your code should be retriever_infos = [ {“name”: “faktura”, “description”: “It’s just an invoice in polish”, “retriever”: dbs[“faktura”].as_retriever()}, {“name”: “kaufland”, “description”: “It’s about kaufland file”, “retriever”: dbs[“kaufland”].as_retriever()}, Add more retrievers as needed

]

Note that the multiretrievalqa chain uses the llm to choose between the two retrievers. If you want to retrieve the best ‘k’ documents across all vector databases, then you could compose a single index as zbusia suggests. Other options are possible.

@sudranga https://github.com/sudranga Thanks for your suggestion. Well, when I try it, although the script runs, after entering the input, I get the following error: @dosu-beta please check it also. Maybe it means that basically it doesn’t allow using 2 FAISS index directories in same time.

Traceback (most recent call last): File “c:\Users\48518\Langchain\faiss_search_load_multiple.py”, line 76, in <module> response = chain.run(user_input, memory) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “C:\Users\48518\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\langchain\chains\base.py”, line 486, in run raise ValueError(“run supports only one positional argument.”) ValueError: run supports only one positional argument.

— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/langchain/issues/11303#issuecomment-1749592123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC34OTXP5UKXCY3GBMBFCF3X54JSHAVCNFSM6AAAAAA5P3W5XCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBZGU4TEMJSGM . You are receiving this because you were mentioned.Message ID: @.***>