langchain: ValidationError: 2 validation errors for DocArrayDoc returned when try to execute the RAG Search Example
System Info
Langchain Version: 0.0.335 Platform: Win11 Python Version: 3.11.5
Hi experts,
I’m trying to execute the RAG Search Example on the Langchain Doc: https://python.langchain.com/docs/expression_language/get_started
Here is the code: from langchain.chat_models import ChatOpenAI from langchain.embeddings import OpenAIEmbeddings from langchain.prompts import ChatPromptTemplate from langchain.vectorstores import DocArrayInMemorySearch from langchain.schema import StrOutputParser from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
vectorstore = DocArrayInMemorySearch.from_texts( [“harrison worked at kensho”, “bears like to eat honey”], embedding=OpenAIEmbeddings(), ) retriever = vectorstore.as_retriever()
template = “”"Answer the question based only on the following context: {context}
Question: {question} “”" prompt = ChatPromptTemplate.from_template(template) model = ChatOpenAI() output_parser = StrOutputParser()
setup_and_retrieval = RunnableParallel( {“context”: retriever, “question”: RunnablePassthrough()} ) chain = setup_and_retrieval | prompt | model | output_parser
chain.invoke(“where did harrison work?”)
but the example fails with the ValidationError: 2 validation errors for DocArrayDoc.
Here is the error details:
C:\Project\pythonProjectAI.venv\Lib\site-packages\pydantic_migration.py:283: UserWarning: pydantic.error_wrappers:ValidationError
has been moved to pydantic:ValidationError
.
warnings.warn(f’{import_path}
has been moved to {new_location}
.')
Traceback (most recent call last):
File “C:\Project\pythonProjectAI\AI.py”, line 28, in <module>
chain.invoke(“where did harrison work?”)
File “C:\Project\pythonProjectAI.venv\Lib\site-packages\langchain\schema\runnable\base.py”, line 1427, in invoke
input = step.invoke(
^^^^^^^^^^^^
File “C:\Project\pythonProjectAI.venv\Lib\site-packages\langchain\schema\runnable\base.py”, line 1938, in invoke
output = {key: future.result() for key, future in zip(steps, futures)}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Project\pythonProjectAI.venv\Lib\site-packages\langchain\schema\runnable\base.py”, line 1938, in <dictcomp>
output = {key: future.result() for key, future in zip(steps, futures)}
^^^^^^^^^^^^^^^
File “C:\Program Files\Python\Lib\concurrent\futures_base.py”, line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File “C:\Program Files\Python\Lib\concurrent\futures_base.py”, line 401, in __get_result
raise self._exception
File “C:\Program Files\Python\Lib\concurrent\futures\thread.py”, line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Project\pythonProjectAI.venv\Lib\site-packages\langchain\schema\retriever.py”, line 112, in invoke
return self.get_relevant_documents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Project\pythonProjectAI.venv\Lib\site-packages\langchain\schema\retriever.py”, line 211, in get_relevant_documents
raise e
File “C:\Project\pythonProjectAI.venv\Lib\site-packages\langchain\schema\retriever.py”, line 204, in get_relevant_documents
result = self._get_relevant_documents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Project\pythonProjectAI.venv\Lib\site-packages\langchain\schema\vectorstore.py”, line 657, in _get_relevant_documents
docs = self.vectorstore.similarity_search(query, **self.search_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Project\pythonProjectAI.venv\Lib\site-packages\langchain\vectorstores\docarray\base.py”, line 127, in similarity_search
results = self.similarity_search_with_score(query, k=k, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Project\pythonProjectAI.venv\Lib\site-packages\langchain\vectorstores\docarray\base.py”, line 106, in similarity_search_with_score
query_doc = self.doc_cls(embedding=query_embedding) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Project\pythonProjectAI.venv\Lib\site-packages\pydantic\main.py”, line 164, in init
pydantic_self.pydantic_validator.validate_python(data, self_instance=pydantic_self)
pydantic_core._pydantic_core.ValidationError: 2 validation errors for DocArrayDoc
text
Field required [type=missing, input_value={‘embedding’: [-0.0192381…, 0.010137099064823456]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
metadata
Field required [type=missing, input_value={‘embedding’: [-0.0192381…, 0.010137099064823456]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
Who can help?
No response
Information
- The official example notebooks/scripts
- My own modified scripts
Related Components
- LLMs/Chat Models
- Embedding Models
- Prompts / Prompt Templates / Prompt Selectors
- Output Parsers
- Document Loaders
- Vector Stores / Retrievers
- Memory
- Agents / Agent Executors
- Tools / Toolkits
- Chains
- Callbacks/Tracing
- Async
Reproduction
from langchain.chat_models import ChatOpenAI from langchain.embeddings import OpenAIEmbeddings from langchain.prompts import ChatPromptTemplate from langchain.vectorstores import DocArrayInMemorySearch from langchain.schema import StrOutputParser from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
vectorstore = DocArrayInMemorySearch.from_texts( [“harrison worked at kensho”, “bears like to eat honey”], embedding=OpenAIEmbeddings(), ) retriever = vectorstore.as_retriever()
template = “”"Answer the question based only on the following context: {context}
Question: {question} “”" prompt = ChatPromptTemplate.from_template(template) model = ChatOpenAI() output_parser = StrOutputParser()
setup_and_retrieval = RunnableParallel( {“context”: retriever, “question”: RunnablePassthrough()} ) chain = setup_and_retrieval | prompt | model | output_parser
chain.invoke(“where did harrison work?”)
Expected behavior
The example runs well.
About this issue
- Original URL
- State: open
- Created 7 months ago
- Reactions: 11
- Comments: 19
I’m also experiencing this issue, but I was able to find a workaround. The issue seems to be a conflict in the version of pydantic that gets installed. You want a 1.10.x version, not 2.x. See https://github.com/langchain-ai/langchain/issues/12916
In the meantime, I was able to work around this locally with
pip install pydantic==1.10.8
.It might be helpful to install the pydantic==1.10.13. It`s useful for me with the following environment:
pydantic== 1.10.13 docarray== 0.32.1
🤖
Hello @xidian237!
I’m Dosu, a friendly bot here to assist you with your issues and questions related to LangChain. While you’re waiting for a human maintainer, I’m here to help guide you through the process of bug fixing and contributing. Let’s get started!
It seems like the error you’re encountering is related to the
DocArrayDoc
class. The error message indicates that thetext
andmetadata
fields are missing when trying to create aDocArrayDoc
instance.In the LangChain codebase, the
DocArrayDoc
class is used to define the schema of the documents that are stored in theDocArray
based vector stores. It has three attributes:text
,embedding
, andmetadata
. Thetext
andmetadata
fields are optional, but they are required when creating aDocArrayDoc
instance for thesimilarity_search_with_score
method in theDocArrayInMemorySearch
class.Here is the relevant code for the
DocArrayDoc
class:And here is the part of the
similarity_search_with_score
method where theDocArrayDoc
instance is created:From the traceback you provided, it seems that the
query_embedding
does not include thetext
andmetadata
fields, which leads to theValidationError
.To fix this issue, you could try to ensure that the
query_embedding
includes thetext
andmetadata
fields when calling thesimilarity_search_with_score
method. If you’re not sure how to do this, you could check the documentation or the source code of theOpenAIEmbeddings
class to see how the embeddings are created.If this doesn’t solve your issue, please provide more details about how you’re using the
OpenAIEmbeddings
class and theDocArrayInMemorySearch
class, so I can give you more specific advice.I hope this helps! If you have any other questions, feel free to ask.
Best regards, Dosu
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
🤖
Based on the error message you provided, it seems that the
text
andmetadata
fields are missing when trying to create aDocArrayDoc
instance. This could be due to the fact that the document you are trying to convert into aDocArrayDoc
does not contain these fields.In LangChain, the
_docarray_to_langchain_doc
method is responsible for converting a DocArray document to a LangChain document format. This method expects the document to contain acontent_field
which is used aspage_content
in the LangChain document. All other fields excluding tensors and embeddings are added to themetadata
of the LangChain document.If the
content_field
is not found in the document, aValueError
is raised. This could be the source of your issue. Make sure that the documents you are trying to convert contain thecontent_field
specified in theDocArrayRetriever
instance.Here is the relevant code:
You can find this code in the libs/langchain/langchain/retrievers/docarray.py file in the LangChain repository.
Also, the
DocArrayInMemorySearch.from_texts
method in LangChain expects the input data in the following structure:texts
: A list of strings. Each string in this list represents a text data.embedding
: An instance of theEmbeddings
class. This is used as the embedding function.metadatas
: An optional list of dictionaries. Each dictionary in this list represents metadata for each corresponding text in thetexts
list. If no metadata is provided, it defaults to None.Here is the relevant part of the code:
You can find this code in the libs/langchain/langchain/vectorstores/docarray/in_memory.py file in the LangChain repository.
Please ensure that the documents you are trying to convert contain the
content_field
specified in theDocArrayRetriever
instance and that all other fields that should be included in themetadata
are of the correct type (str, int, float, bool).Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
pydantic==1.10.8 docarray==0.40.0
It works for me when I use the above versions of pedantic and docarray
@swordsmanluke I solved this problem by using
pip install pydantic==1.10.8
, thank you.I downgraded the pydantic version, and it works now.
It’s now working with pydantic 1.10.13 and docarray 0.32.1!
I had a similar issue when trying out the RAF search example.
{ “name”: “AttributeError”, “message”: “‘builtin_function_or_method’ object has no attribute ‘func’”, “stack”: "--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[2], line 2 1 from langchain.vectorstores import DocArrayInMemorySearch ----> 2 vectorstore = DocArrayInMemorySearch.from_texts( 3 ["harrison worked at kensho", "bears like to eat honey"], 4 embedding=OpenAIEmbeddings(), 5 ) 6 retriever = vectorstore.as_retriever()
File ~/mambaforge/envs/langchain/lib/python3.12/site-packages/langchain/vectorstores/docarray/in_memory.py:67, in DocArrayInMemorySearch.from_texts(cls, texts, embedding, metadatas, **kwargs) 45 @classmethod 46 def from_texts( 47 cls, (…) 51 **kwargs: Any, 52 ) -> DocArrayInMemorySearch: 53 """Create an DocArrayInMemorySearch store and insert data. 54 55 Args: (…) 65 DocArrayInMemorySearch Vector Store 66 """ —> 67 store = cls.from_params(embedding, **kwargs) 68 store.add_texts(texts=texts, metadatas=metadatas) 69 return store
File ~/mambaforge/envs/langchain/lib/python3.12/site-packages/langchain/vectorstores/docarray/in_memory.py:38, in DocArrayInMemorySearch.from_params(cls, embedding, metric, **kwargs) 20 @classmethod 21 def from_params( 22 cls, (…) 27 **kwargs: Any, 28 ) -> DocArrayInMemorySearch: 29 """Initialize DocArrayInMemorySearch store. 30 31 Args: (…) 36 **kwargs: Other keyword arguments to be passed to the get_doc_cls method. 37 """ —> 38 _check_docarray_import() 39 from docarray.index import InMemoryExactNNIndex 41 doc_cls = cls._get_doc_cls(space=metric, **kwargs)
File ~/mambaforge/envs/langchain/lib/python3.12/site-packages/langchain/vectorstores/docarray/base.py:19, in _check_docarray_import() 17 def _check_docarray_import() -> None: 18 try: —> 19 import docarray 21 da_version = docarray.version.split(".") 22 if int(da_version[0]) == 0 and int(da_version[1]) <= 31:
File ~/mambaforge/envs/langchain/lib/python3.12/site-packages/docarray/init.py:5 1 version = ‘0.39.1’ 3 import logging ----> 5 from docarray.array import DocList, DocVec 6 from docarray.base_doc.doc import BaseDoc 7 from docarray.utils._internal.misc import _get_path_from_docarray_root_level
File ~/mambaforge/envs/langchain/lib/python3.12/site-packages/docarray/array/init.py:2 1 from docarray.array.any_array import AnyDocArray ----> 2 from docarray.array.doc_list.doc_list import DocList 3 from docarray.array.doc_vec.doc_vec import DocVec 5 all = [‘DocList’, ‘DocVec’, ‘AnyDocArray’]
File ~/mambaforge/envs/langchain/lib/python3.12/site-packages/docarray/array/doc_list/doc_list.py:51 43 T = TypeVar(‘T’, bound=‘DocList’) 44 T_doc = TypeVar(‘T_doc’, bound=BaseDocWithoutId) 47 class DocList( 48 ListAdvancedIndexing[T_doc], 49 PushPullMixin, 50 IOMixinDocList, —> 51 AnyDocArray[T_doc], 52 ): 53 """ 54 DocList is a container of Documents. 55 (…) 121 122 """ 124 doc_type: Type[BaseDocWithoutId] = AnyDoc
File ~/mambaforge/envs/langchain/lib/python3.12/site-packages/docarray/array/any_array.py:54, in AnyDocArray.class_getitem(cls, item) 51 @classmethod 52 def class_getitem(cls, item: Union[Type[BaseDocWithoutId], TypeVar, str]): 53 if not isinstance(item, type): —> 54 return Generic.class_getitem.func(cls, item) # type: ignore 55 # this do nothing that checking that item is valid type var or str 56 if not safe_issubclass(item, BaseDocWithoutId):
AttributeError: ‘builtin_function_or_method’ object has no attribute ‘func’" }
Here’s are my package installations
langchain 0.0.350 pypi_0 pypi python 3.12.0 h47c9636_0_cpython conda-forge pydantic 2.5.2 pyhd8ed1ab_0 conda-forge