langchain: ChromaDB 0.4+ is no longer compatible with client config

System Info

Python 3.9.13
langchain-0.0.235-py3-none-any.whl
chromadb-0.4.0-py3-none-any.whl

Who can help?

No response

Information

The official example notebooks/scripts
My own modified scripts

Related Components

LLMs/Chat Models
Embedding Models
Prompts / Prompt Templates / Prompt Selectors
Output Parsers
Document Loaders
Vector Stores / Retrievers
Memory
Agents / Agent Executors
Tools / Toolkits
Chains
Callbacks/Tracing
Async

Reproduction

Steps to reproduce:

Create a Chroma store which is locally persisted

store = Chroma.from_texts(
    texts=docs, embedding=embeddings, metadatas=metadatas, persist_directory=environ["DB_DIR"]
)

Get the error You are using a deprecated configuration of Chroma. Please pip install chroma-migrate and run chroma-migrate to upgrade your configuration. See https://docs.trychroma.com/migration for more information or join our discord at https://discord.gg/8g5FESbj for help!
Suffer

Expected behavior

Create locally persisted Chroma store
Use Chroma store

The issue:

Starting chromadb 0.40 the chroma_db_impl is no longer a supported parameter, it uses sqlite instead.

Removing the line chroma_db_impl="duckdb+parquet", from langchain.vectorstores/chroma.py solves the issue, but the earlier DB cannot be used or migrated.

About this issue

Original URL
State: closed
Created a year ago
Reactions: 16
Comments: 51 (17 by maintainers)

Commits related to this issue

Fix for chromadb 0.40+ #7887 — committed to gpapp/langchain by gpapp a year ago
Gpapp/chromadb (#7891) - Description: version check to make sure chromadb >=0.4.0 does not throw an error, and uses the default sqlite persistence engine when the directory is set, - Issue: the ... — committed to langchain-ai/langchain by gpapp a year ago
Gpapp/chromadb (#7891) - Description: version check to make sure chromadb >=0.4.0 does not throw an error, and uses the default sqlite persistence engine when the directory is set, - Issue: the ... — committed to aerrober/langchain-fork by gpapp a year ago

Most upvoted comments

Thanks for taking quick action on this issue.

I was trying to figure out how to use the Chroma class with this new update, and found out that this code works:

import chromadb
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

client = chromadb.PersistentClient(path="./db")
embeddings = SentenceTransformerEmbeddings(
    model_name="all-mpnet-base-v2",
)
vector_db = Chroma(
    client=client,
    collection_name="your-collection-name",
    embedding_function=embeddings,
)
print(f"Documents Loaded: {vector_db._collection.count()}")

Sharing in case someone else needs it.

+11

kevinknights29 on Jul 19, 2023

chromadb 0.4.4
langchain 0.0.249

This is what works for me

import chromadb
from chromadb.config import Settings

path='./db'
settings = Settings(
        persist_directory=path,
        anonymized_telemetry=False
)

client = chromadb.PersistentClient(settings=settings , path=path)

Create new db

db = Chroma.from_documents(
                    client=client,
                    documents=texts,
                    embedding=embeddings,
                )

Read existing db

db = Chroma(
              client=client,
              embedding_function=embeddings,
          )
db.add_documents(texts)

Retriever from existing db

db = Chroma(
            client=client,
            embedding_function=embeddings,
        )
retriever = db.as_retriever( ... )

victorlee0505 on Aug 3, 2023

@vincentwang79 this worked for me… https://gist.github.com/jeffchuber/a9ebc0ad5c7b053b8d1c50449c07f893

chroma==0.4.2 langchain==0.0.237

jeffchuber on Jul 21, 2023

Answer generated by a 🤖

Answer

Thank you for bringing this issue to our attention. It seems that the latest version of ChromaDB is no longer compatible with the client config, causing an error when creating a locally persisted Chroma store.

Based on your description, it appears that you have already identified a potential solution by removing the line chroma_db_impl="duckdb+parquet" from the langchain.vectorstores/chroma.py file. This workaround seems to resolve the issue, but it also means that the earlier DB cannot be used or migrated.

We appreciate your effort in finding a solution to this problem. If you would like to contribute your proposed fix to the LangChain project, we encourage you to create a pull request (PR) with your changes. The maintainers will review your contribution and decide if it should be merged into the project.

Thank you for your support and for helping to improve LangChain!

This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

dosubot[bot] on Jul 18, 2023

@kesavan1994 you can pass a chroma client to langchain Chroma init

import chromadb
client = chromadb. PersistentClient(path='./db')
langchain_chroma = Chroma(client=client, collection_name='langchain_collection')

You can set path and collection_name to be whatever you like.

Does this answer your question?

jeffchuber on Jul 29, 2023

@gpapp thanks again for making the backwards compatibility PR!

jeffchuber on Jul 20, 2023

Hey @kevinknights29, this is just to notify you that in the end the issue has been fixed! The below code suffices, as it is backwards compatible as stated by @jeffchuber.

from langchain.vectorstores import Chroma

output_dir = "./db_metadata_v5"
db = Chroma.from_documents(chunks, embeddings, persist_directory=output_dir)

The difference is that now the embeddings, the text and various metadata are being stored in sqlite3. Querying works just as it used to. I believe that the issue can now finally be closed.

Note: I am on chromadb v0.4.2 and langchain v0.0.237.

aevedis on Jul 20, 2023

Hi everyone, https://github.com/hwchase17/langchain/releases/tag/v0.0.236 was just released this morning and is backwards compatible. @kevinknights29 @Maipengfei please upgrade and if you have any issues, please ask in our discord - https://discord.gg/MMeYNTmh3x - we are responding very quickly right now!

jeffchuber on Jul 19, 2023

@aevedis vector_db = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=output_dir) should now be db = vector_db.from_documents(documents=chunks, embedding=embeddings, persist_directory=output_dir) instead, otherwise you are just overwriting the vector_db variable. also vector_db.persist() is no longer needed and has been removed. It should be implicitly called when you call from_documents.

maheeppartap on Jul 19, 2023

@rtomaf, im sorry im not able to reproduce this.

jeffchuber on Nov 3, 2023

Thank you!!!

Well, this was crazy ride, but the final answer is :

langchain 0.0.228 chromadb 0.4.15 has the issue with “deprecated old config” vs langchain 0.0.325 chromadb 0.4.15

does not have any issues.

So the issue was langchain all along even do the error is coming from chroma. I guess implementation in in 228 had the problem.

Its unclear to me why pip install langchain (maybe early in a week when I re-created virtualenv would pull 0.228 version from July instead of any of the Oct versions).

nonetheless

Thank you for the time. The imports of version lines were the most critical. guess next time I’ll start there first. pip install langchain --upgrade

solves the issue and

vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embedding,
    persist_directory="./chroma_db"
)

vectordb.persist()

now works without a problem in a same-way as deeplearning.ai tutorial.

mrinterestfull on Oct 27, 2023

@victorlee0505

Thank you so much… It’s working now… ^_^ PrivateGPT’s ingest.py .

I blieve you should be able provide a pull request over there. At least one more file to be modified: privateGPT.py accordingly…

Thanks…

jiapei100 on Aug 3, 2023

Hey everyone, I guess this will help https://docs.trychroma.com/migration Check out the end of this page, you’ll find the reason for the error here.

Edit: Adding more context here: With version chroma 0.4.4 there has been some changes with respect to how .persist() and .reset() work. Verify how you are creating your client for the vector db

That fixed the error for me.

abhitatachar2000 on Jul 21, 2023

@vincentwang79 this worked for me… https://gist.github.com/jeffchuber/a9ebc0ad5c7b053b8d1c50449c07f893

chroma==0.4.2 langchain==0.0.237

Yes, your code also works in my environment. I must have got something wrong. Thanks!

vincentwang79 on Jul 21, 2023

@jeffchuber, Thanks for the heads-up!

kevinknights29 on Jul 19, 2023

I also meet this problem， so I can only installed the older version(chromadb==0.3.21) to solve that.

—Original— From: "Andrey E. @.> Date: Wed, Jul 19, 2023 23:29 PM To: @.>; Cc: @.@.>; Subject: Re: [hwchase17/langchain] ChromaDB 0.4+ is no longer compatible withclient config (Issue #7887)

db = vector_db.from_documents(documents=chunks, embedding=embeddings, persist_directory=output_dir)

Eh, but it still does not work. I get TypeError: ‘NoneType’ object is not iterable .

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Maipengfei on Jul 19, 2023

Answer generated by a 🤖

Answer

I understand that you’re encountering an issue with ChromaDB 0.4+ compatibility in the LangChain Python framework. The error message you’re receiving suggests that the ChromaDB configuration you’re using is deprecated and needs to be upgraded. You’ve found a workaround by removing the line chroma_db_impl="duckdb+parquet" from langchain.vectorstores/chroma.py, but this prevents you from using or migrating the earlier database.

To address this issue, one potential solution could be to use a conditional statement to check the version of ChromaDB being used and then apply the appropriate configuration. Here’s how you could implement this:

import chromadb

# Get the version of ChromaDB
chroma_version = chromadb.__version__

# Check if the version is 0.4 or later
if float(chroma_version) >= 0.4:
    # Use the new configuration
    _client_settings = chromadb.config.Settings(
        chroma_db_impl="new_configuration",
        persist_directory=persist_directory,
    )
else:
    # Use the old configuration
    _client_settings = chromadb.config.Settings(
        chroma_db_impl="duckdb+parquet",
        persist_directory=persist_directory,
    )

In this code, "new_configuration" should be replaced with the appropriate configuration for ChromaDB 0.4+.

This solution should allow you to use ChromaDB 0.4+ in a backwards compatible way. However, please note that you may still need to handle the migration process as suggested by the error message. This might involve using the chroma-migrate tool or a similar method to upgrade your existing data to the new configuration.

This is just one potential solution and it might need to be adjusted based on your specific code and requirements. If you find this solution helpful and believe it’s not covered by existing documentation, I encourage you to make a pull request to update the documentation for the benefit of other users.

Sources

langchain/vectorstores/chroma.py

This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

dosubot[bot] on Jul 18, 2023