langchain: ChromaDB 0.4+ is no longer compatible with client config

System Info

  • Python 3.9.13

  • langchain-0.0.235-py3-none-any.whl

  • chromadb-0.4.0-py3-none-any.whl

Who can help?

No response

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

Steps to reproduce:

  1. Create a Chroma store which is locally persisted
store = Chroma.from_texts(
    texts=docs, embedding=embeddings, metadatas=metadatas, persist_directory=environ["DB_DIR"]
)
  1. Get the error You are using a deprecated configuration of Chroma. Please pip install chroma-migrate and run chroma-migrate to upgrade your configuration. See https://docs.trychroma.com/migration for more information or join our discord at https://discord.gg/8g5FESbj for help!
  2. Suffer

Expected behavior

  1. Create locally persisted Chroma store
  2. Use Chroma store

The issue:

Starting chromadb 0.40 the chroma_db_impl is no longer a supported parameter, it uses sqlite instead.

Removing the line chroma_db_impl="duckdb+parquet", from langchain.vectorstores/chroma.py solves the issue, but the earlier DB cannot be used or migrated.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 16
  • Comments: 51 (17 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for taking quick action on this issue.

I was trying to figure out how to use the Chroma class with this new update, and found out that this code works:

import chromadb
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

client = chromadb.PersistentClient(path="./db")
embeddings = SentenceTransformerEmbeddings(
    model_name="all-mpnet-base-v2",
)
vector_db = Chroma(
    client=client,
    collection_name="your-collection-name",
    embedding_function=embeddings,
)
print(f"Documents Loaded: {vector_db._collection.count()}")

Sharing in case someone else needs it.

  • chromadb 0.4.4
  • langchain 0.0.249

This is what works for me

import chromadb
from chromadb.config import Settings

path='./db'
settings = Settings(
        persist_directory=path,
        anonymized_telemetry=False
)

client = chromadb.PersistentClient(settings=settings , path=path)

Create new db

db = Chroma.from_documents(
                    client=client,
                    documents=texts,
                    embedding=embeddings,
                )

Read existing db

db = Chroma(
              client=client,
              embedding_function=embeddings,
          )
db.add_documents(texts)

Retriever from existing db

db = Chroma(
            client=client,
            embedding_function=embeddings,
        )
retriever = db.as_retriever( ... )

Answer generated by a 🤖

Answer

Thank you for bringing this issue to our attention. It seems that the latest version of ChromaDB is no longer compatible with the client config, causing an error when creating a locally persisted Chroma store.

Based on your description, it appears that you have already identified a potential solution by removing the line chroma_db_impl="duckdb+parquet" from the langchain.vectorstores/chroma.py file. This workaround seems to resolve the issue, but it also means that the earlier DB cannot be used or migrated.

We appreciate your effort in finding a solution to this problem. If you would like to contribute your proposed fix to the LangChain project, we encourage you to create a pull request (PR) with your changes. The maintainers will review your contribution and decide if it should be merged into the project.

Thank you for your support and for helping to improve LangChain!


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

@kesavan1994 you can pass a chroma client to langchain Chroma init

import chromadb
client = chromadb. PersistentClient(path='./db')
langchain_chroma = Chroma(client=client, collection_name='langchain_collection')

You can set path and collection_name to be whatever you like.

Does this answer your question?

@gpapp thanks again for making the backwards compatibility PR!

Hey @kevinknights29, this is just to notify you that in the end the issue has been fixed! The below code suffices, as it is backwards compatible as stated by @jeffchuber.

from langchain.vectorstores import Chroma

output_dir = "./db_metadata_v5"
db = Chroma.from_documents(chunks, embeddings, persist_directory=output_dir)

The difference is that now the embeddings, the text and various metadata are being stored in sqlite3. Querying works just as it used to. I believe that the issue can now finally be closed.

Note: I am on chromadb v0.4.2 and langchain v0.0.237.

Hi everyone, https://github.com/hwchase17/langchain/releases/tag/v0.0.236 was just released this morning and is backwards compatible. @kevinknights29 @Maipengfei please upgrade and if you have any issues, please ask in our discord - https://discord.gg/MMeYNTmh3x - we are responding very quickly right now!

@aevedis vector_db = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=output_dir) should now be db = vector_db.from_documents(documents=chunks, embedding=embeddings, persist_directory=output_dir) instead, otherwise you are just overwriting the vector_db variable. also vector_db.persist() is no longer needed and has been removed. It should be implicitly called when you call from_documents.

@rtomaf, im sorry im not able to reproduce this.

Thank you!!!

Well, this was crazy ride, but the final answer is :

langchain 0.0.228 chromadb 0.4.15 has the issue with “deprecated old config” vs langchain 0.0.325 chromadb 0.4.15

does not have any issues.

So the issue was langchain all along even do the error is coming from chroma. I guess implementation in in 228 had the problem.

Its unclear to me why pip install langchain (maybe early in a week when I re-created virtualenv would pull 0.228 version from July instead of any of the Oct versions).

nonetheless

Thank you for the time. The imports of version lines were the most critical. guess next time I’ll start there first. pip install langchain --upgrade

solves the issue and

vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embedding,
    persist_directory="./chroma_db"
)

vectordb.persist()

now works without a problem in a same-way as deeplearning.ai tutorial.

@victorlee0505

Thank you so much… It’s working now… ^_^ PrivateGPT’s ingest.py .

I blieve you should be able provide a pull request over there. At least one more file to be modified: privateGPT.py accordingly…

Thanks…

Hey everyone, I guess this will help https://docs.trychroma.com/migration Check out the end of this page, you’ll find the reason for the error here.

Edit: Adding more context here: With version chroma 0.4.4 there has been some changes with respect to how .persist() and .reset() work. Verify how you are creating your client for the vector db

That fixed the error for me.

@vincentwang79 this worked for me… https://gist.github.com/jeffchuber/a9ebc0ad5c7b053b8d1c50449c07f893

chroma==0.4.2 langchain==0.0.237

Yes, your code also works in my environment. I must have got something wrong. Thanks!

@jeffchuber, Thanks for the heads-up!

 I also meet this problem, so I can only installed the older version(chromadb==0.3.21) to solve that.

—Original— From: "Andrey E. @.> Date: Wed, Jul 19, 2023 23:29 PM To: @.>; Cc: @.@.>; Subject: Re: [hwchase17/langchain] ChromaDB 0.4+ is no longer compatible withclient config (Issue #7887)

db = vector_db.from_documents(documents=chunks, embedding=embeddings, persist_directory=output_dir)

Eh, but it still does not work. I get TypeError: ‘NoneType’ object is not iterable .

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Answer generated by a 🤖

Answer

I understand that you’re encountering an issue with ChromaDB 0.4+ compatibility in the LangChain Python framework. The error message you’re receiving suggests that the ChromaDB configuration you’re using is deprecated and needs to be upgraded. You’ve found a workaround by removing the line chroma_db_impl="duckdb+parquet" from langchain.vectorstores/chroma.py, but this prevents you from using or migrating the earlier database.

To address this issue, one potential solution could be to use a conditional statement to check the version of ChromaDB being used and then apply the appropriate configuration. Here’s how you could implement this:

import chromadb

# Get the version of ChromaDB
chroma_version = chromadb.__version__

# Check if the version is 0.4 or later
if float(chroma_version) >= 0.4:
    # Use the new configuration
    _client_settings = chromadb.config.Settings(
        chroma_db_impl="new_configuration",
        persist_directory=persist_directory,
    )
else:
    # Use the old configuration
    _client_settings = chromadb.config.Settings(
        chroma_db_impl="duckdb+parquet",
        persist_directory=persist_directory,
    )

In this code, "new_configuration" should be replaced with the appropriate configuration for ChromaDB 0.4+.

This solution should allow you to use ChromaDB 0.4+ in a backwards compatible way. However, please note that you may still need to handle the migration process as suggested by the error message. This might involve using the chroma-migrate tool or a similar method to upgrade your existing data to the new configuration.

This is just one potential solution and it might need to be adjusted based on your specific code and requirements. If you find this solution helpful and believe it’s not covered by existing documentation, I encourage you to make a pull request to update the documentation for the benefit of other users.

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.