langchain: Bug after the openai updated in Embedding

System Info

AWS Sagemaker DataScience3.0 Image.

Who can help?

No response

Information

The official example notebooks/scripts
My own modified scripts

Related Components

LLMs/Chat Models
Embedding Models
Prompts / Prompt Templates / Prompt Selectors
Output Parsers
Document Loaders
Vector Stores / Retrievers
Memory
Agents / Agent Executors
Tools / Toolkits
Chains
Callbacks/Tracing
Async

Reproduction

here is my code, it worked before Nov 7th.

Chroma.from_documents(documents=document, embedding=embeddings,)

Then i get this error: ValueError: Expected EmbeddingFunction.__call__ to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs']) Please see https://docs.trychroma.com/embeddings for details of the EmbeddingFunction interface. Please note the recent change to the EmbeddingFunction interface: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

Expected behavior

Is anyone know how to fix this?

About this issue

Original URL
State: closed
Created 8 months ago
Reactions: 5
Comments: 22 (3 by maintainers)

Commits related to this issue

Fix chromadb also pin. See https://github.com/langchain-ai/langchain/issues/13051#issuecomment-1801478715 — committed to dennisdegreef/restai by dennisdegreef 8 months ago

Most upvoted comments

🤖

The error you’re encountering is due to a recent change in the EmbeddingFunction interface. The EmbeddingFunction.__call__ method now expects two arguments: self and input. However, it seems that in your code, the EmbeddingFunction.__call__ method is being called with self, args, and kwargs.

To resolve this issue, you need to update the way you’re calling the EmbeddingFunction.__call__ method. Instead of passing args and kwargs, you should pass a single input argument.

Here’s how you can do it:

class SagemakerEndpointEmbeddings(Embeddings):
    # ...

    def embed_query(self, query: str) -> List[float]:
        input = self.prepare_input(query)
        return self.embedding_function(input)

    def embed_documents(self, documents: List[str]) -> List[List[float]]:
        input = self.prepare_input(documents)
        return self.embedding_function(input)

In the prepare_input method, you should prepare the input argument in a way that is compatible with the new EmbeddingFunction.__call__ interface.

Also, you might need to adjust the predict_fn() function within the custom inference.py script to handle batched requests. Change the return line from return {"vectors": sentence_embeddings[0].tolist()} to return {"vectors": sentence_embeddings.tolist()}.

Please refer to the following sources for more information:

I hope this helps! If you have any further questions or if something is unclear, please let me know.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

+13

dosubot[bot] on Nov 8, 2023

Problem seems to appear with chroma update to 0.4.16

Pinning to previous version seems to help.

chromadb==0.4.15

Langchain still implements the old interface. See: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

sjoerd222888 on Nov 8, 2023

Langchain 0.0.332 released with the chroma team’s fix for compatibility with chromadb>=0.4.16

Can now use latest of both pip install -U langchain chromadb

I still met the problem, all the library are the latest version, here are my code:

from langchain.document_loaders import TextLoader
loader = TextLoader('/tanglg/lx/code/Qwen/forest_file.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
 
embeddings = OpenAIEmbeddings(
    openai_api_key="EMPTY",
    openai_api_base="http://localhost:8000/v1", 
)

forest_file_store = Chroma.from_documents(documents=texts, embedding=embeddings, collection_name="forest_file")

and here is the error info:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb 单元格 3 line 1
      [5](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=4) texts = text_splitter.split_documents(documents)
      [7](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=6) embeddings = OpenAIEmbeddings(
      [8](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=7)     openai_api_key="EMPTY",
      [9](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=8)     openai_api_base="http://localhost:8000/v1", 
     [10](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=9) )
---> [12](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=11) forest_file_store = Chroma.from_documents(documents=texts, embedding=embeddings, collection_name="forest_file")

File /tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:684, in Chroma.from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
    [682](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:682) texts = [doc.page_content for doc in documents]
    [683](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:683) metadatas = [doc.metadata for doc in documents]
--> [684](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:684) return cls.from_texts(
    [685](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:685)     texts=texts,
    [686](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:686)     embedding=embedding,
    [687](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:687)     metadatas=metadatas,
    [688](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:688)     ids=ids,
    [689](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:689)     collection_name=collection_name,
    [690](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:690)     persist_directory=persist_directory,
    [691](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:691)     client_settings=client_settings,
    [692](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:692)     client=client,
    [693](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:693)     collection_metadata=collection_metadata,
    [694](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:694)     **kwargs,
    [695](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:695) )
...

ValueError: Expected EmbeddingFunction.__call__ to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs'])
Please see https://docs.trychroma.com/embeddings for details of the EmbeddingFunction interface.
Please note the recent change to the EmbeddingFunction interface: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

ScottXiao233 on Nov 9, 2023

Problem seems to appear with chroma update to 0.4.16

Pinning to previous version seems to help.
chromadb==0.4.15
Langchain still implements the old interface. See: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

Thank you 😂

shreyahegde18 on Nov 21, 2023

O problema parece aparecer com a atualização do chroma para 0.4.16

Fixar na versão anterior parece ajudar.
chromadb==0.4.15
Langchain ainda implementa a interface antiga. Consulte: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

My hero!

Kozato01 on Nov 8, 2023