langchain: ValueError: Expected metadata value to be a str, int, or float, got [{'text': 'Git', 'url': '#git'}] which is a when storing into Chroma vector stores using using element mode of UnstructuredMarkdownLoader
System Info
LangChain: 0.0.248 Python: 3.9.17 OS version: Linux 6.1.27-43.48.amzn2023.x86_64
Who can help?
I will submit a PR for a solution to this problem
Information
- The official example notebooks/scripts
- My own modified scripts
Related Components
- LLMs/Chat Models
- Embedding Models
- Prompts / Prompt Templates / Prompt Selectors
- Output Parsers
- Document Loaders
- Vector Stores / Retrievers
- Memory
- Agents / Agent Executors
- Tools / Toolkits
- Chains
- Callbacks/Tracing
- Async
Reproduction
Code:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.document_loaders import UnstructuredMarkdownLoader
from langchain.text_splitter import CharacterTextSplitter
def testElement():
loader = UnstructuredMarkdownLoader(
"filepath", mode="elements")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
split_docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(split_docs, embeddings)
Also need to have a link format in the markdown file to be load, for example:
- [Google Developer Documentation Style Guide](https://developers.google.com/style)
Error Message:
138 # isinstance(True, int) evaluates to True, so we need to check for bools separately
139 if not isinstance(value, (str, int, float)) or isinstance(value, bool):
--> 140 raise ValueError(
141 f"Expected metadata value to be a str, int, or float, got {value} which is a {type(value)}"
142 )
ValueError: Expected metadata value to be a str, int, or float, got [{'text': 'Git', 'url': '#git'}] which is a <class 'list'>
Expected behavior
I expect to see the split documents loaded into Chroma, however, this raise error for not passing type check for metadata.
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 1
- Comments: 17
Commits related to this issue
- fix issue #8556 by restructure the metadata links — committed to cheeseQI/langchain by cheeseQI a year ago
Something like this should do it:
Seems logical, thank you for this. I actually just changed from Chroma to FAISS that supports list of strings, much simpler 😄
"Thanks for sharing that! It seems like the problem with unsupported metadata value types is common when put into Chroma. Other than my initial solution, I’m considering whether using a filter might be a better and more general way to solve this issue. Just like this pr #9015