langchain: Bug after the openai updated in Embedding

System Info

AWS Sagemaker DataScience3.0 Image.

Who can help?

No response

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

here is my code, it worked before Nov 7th.

Chroma.from_documents(documents=document, embedding=embeddings,)

Then i get this error: ValueError: Expected EmbeddingFunction.__call__ to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs']) Please see https://docs.trychroma.com/embeddings for details of the EmbeddingFunction interface. Please note the recent change to the EmbeddingFunction interface: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

Expected behavior

Is anyone know how to fix this?

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Reactions: 5
  • Comments: 22 (3 by maintainers)

Commits related to this issue

Most upvoted comments

šŸ¤–

The error you’re encountering is due to a recent change in the EmbeddingFunction interface. The EmbeddingFunction.__call__ method now expects two arguments: self and input. However, it seems that in your code, the EmbeddingFunction.__call__ method is being called with self, args, and kwargs.

To resolve this issue, you need to update the way you’re calling the EmbeddingFunction.__call__ method. Instead of passing args and kwargs, you should pass a single input argument.

Here’s how you can do it:

class SagemakerEndpointEmbeddings(Embeddings):
    # ...

    def embed_query(self, query: str) -> List[float]:
        input = self.prepare_input(query)
        return self.embedding_function(input)

    def embed_documents(self, documents: List[str]) -> List[List[float]]:
        input = self.prepare_input(documents)
        return self.embedding_function(input)

In the prepare_input method, you should prepare the input argument in a way that is compatible with the new EmbeddingFunction.__call__ interface.

Also, you might need to adjust the predict_fn() function within the custom inference.py script to handle batched requests. Change the return line from return {"vectors": sentence_embeddings[0].tolist()} to return {"vectors": sentence_embeddings.tolist()}.

Please refer to the following sources for more information:

I hope this helps! If you have any further questions or if something is unclear, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a šŸ‘ if this is helpful and šŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Problem seems to appear with chroma update to 0.4.16

Pinning to previous version seems to help.

chromadb==0.4.15

Langchain still implements the old interface. See: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

Langchain 0.0.332 released with the chroma team’s fix for compatibility with chromadb>=0.4.16

Can now use latest of both pip install -U langchain chromadb

I still met the problem, all the library are the latest version, here are my code:

from langchain.document_loaders import TextLoader
loader = TextLoader('/tanglg/lx/code/Qwen/forest_file.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
 
embeddings = OpenAIEmbeddings(
    openai_api_key="EMPTY",
    openai_api_base="http://localhost:8000/v1", 
)

forest_file_store = Chroma.from_documents(documents=texts, embedding=embeddings, collection_name="forest_file")

and here is the error info:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb å•å…ƒę ¼ 3 line 1
      [5](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=4) texts = text_splitter.split_documents(documents)
      [7](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=6) embeddings = OpenAIEmbeddings(
      [8](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=7)     openai_api_key="EMPTY",
      [9](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=8)     openai_api_base="http://localhost:8000/v1", 
     [10](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=9) )
---> [12](vscode-notebook-cell://ssh-remote%2B172.30.254.229/tanglg/lx/code/Qwen/xly_vectordb_test.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=11) forest_file_store = Chroma.from_documents(documents=texts, embedding=embeddings, collection_name="forest_file")

File /tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:684, in Chroma.from_documents(cls, documents, embedding, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
    [682](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:682) texts = [doc.page_content for doc in documents]
    [683](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:683) metadatas = [doc.metadata for doc in documents]
--> [684](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:684) return cls.from_texts(
    [685](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:685)     texts=texts,
    [686](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:686)     embedding=embedding,
    [687](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:687)     metadatas=metadatas,
    [688](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:688)     ids=ids,
    [689](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:689)     collection_name=collection_name,
    [690](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:690)     persist_directory=persist_directory,
    [691](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:691)     client_settings=client_settings,
    [692](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:692)     client=client,
    [693](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:693)     collection_metadata=collection_metadata,
    [694](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:694)     **kwargs,
    [695](https://vscode-remote+ssh-002dremote-002b172-002e30-002e254-002e229.vscode-resource.vscode-cdn.net/tanglg/commonsoft/miniconda3/envs/qwen/lib/python3.10/site-packages/langchain/vectorstores/chroma.py:695) )
...

ValueError: Expected EmbeddingFunction.__call__ to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs'])
Please see https://docs.trychroma.com/embeddings for details of the EmbeddingFunction interface.
Please note the recent change to the EmbeddingFunction interface: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

Problem seems to appear with chroma update to 0.4.16

Pinning to previous version seems to help.

chromadb==0.4.15

Langchain still implements the old interface. See: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

Thank you šŸ˜‚

O problema parece aparecer com a atualização do chroma para 0.4.16

Fixar na versão anterior parece ajudar.

chromadb==0.4.15

Langchain ainda implementa a interface antiga. Consulte: https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

My hero!