langchain: ChromaDB 0.4+ is no longer compatible with client config
System Info
-
Python 3.9.13
-
langchain-0.0.235-py3-none-any.whl
-
chromadb-0.4.0-py3-none-any.whl
Who can help?
No response
Information
- The official example notebooks/scripts
- My own modified scripts
Related Components
- LLMs/Chat Models
- Embedding Models
- Prompts / Prompt Templates / Prompt Selectors
- Output Parsers
- Document Loaders
- Vector Stores / Retrievers
- Memory
- Agents / Agent Executors
- Tools / Toolkits
- Chains
- Callbacks/Tracing
- Async
Reproduction
Steps to reproduce:
- Create a Chroma store which is locally persisted
store = Chroma.from_texts(
texts=docs, embedding=embeddings, metadatas=metadatas, persist_directory=environ["DB_DIR"]
)
- Get the error
You are using a deprecated configuration of Chroma. Please pip install chroma-migrate and run
chroma-migrateto upgrade your configuration. See https://docs.trychroma.com/migration for more information or join our discord at https://discord.gg/8g5FESbj for help!
- Suffer
Expected behavior
- Create locally persisted Chroma store
- Use Chroma store
The issue:
Starting chromadb 0.40 the chroma_db_impl is no longer a supported parameter, it uses sqlite instead.
Removing the line
chroma_db_impl="duckdb+parquet",
from langchain.vectorstores/chroma.py solves the issue, but the earlier DB cannot be used or migrated.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 16
- Comments: 51 (17 by maintainers)
Commits related to this issue
- Fix for chromadb 0.40+ #7887 — committed to gpapp/langchain by gpapp a year ago
- Gpapp/chromadb (#7891) - Description: version check to make sure chromadb >=0.4.0 does not throw an error, and uses the default sqlite persistence engine when the directory is set, - Issue: the ... — committed to langchain-ai/langchain by gpapp a year ago
- Gpapp/chromadb (#7891) - Description: version check to make sure chromadb >=0.4.0 does not throw an error, and uses the default sqlite persistence engine when the directory is set, - Issue: the ... — committed to aerrober/langchain-fork by gpapp a year ago
Thanks for taking quick action on this issue.
I was trying to figure out how to use the Chroma class with this new update, and found out that this code works:
Sharing in case someone else needs it.
This is what works for me
Create new db
Read existing db
Retriever from existing db
@vincentwang79 this worked for me… https://gist.github.com/jeffchuber/a9ebc0ad5c7b053b8d1c50449c07f893
chroma==0.4.2 langchain==0.0.237
Answer generated by a 🤖
Answer
Thank you for bringing this issue to our attention. It seems that the latest version of ChromaDB is no longer compatible with the client config, causing an error when creating a locally persisted Chroma store.
Based on your description, it appears that you have already identified a potential solution by removing the line
chroma_db_impl="duckdb+parquet"
from thelangchain.vectorstores/chroma.py
file. This workaround seems to resolve the issue, but it also means that the earlier DB cannot be used or migrated.We appreciate your effort in finding a solution to this problem. If you would like to contribute your proposed fix to the LangChain project, we encourage you to create a pull request (PR) with your changes. The maintainers will review your contribution and decide if it should be merged into the project.
Thank you for your support and for helping to improve LangChain!
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
@kesavan1994 you can pass a chroma client to langchain
Chroma
initYou can set
path
andcollection_name
to be whatever you like.Does this answer your question?
@gpapp thanks again for making the backwards compatibility PR!
Hey @kevinknights29, this is just to notify you that in the end the issue has been fixed! The below code suffices, as it is backwards compatible as stated by @jeffchuber.
The difference is that now the embeddings, the text and various metadata are being stored in
sqlite3
. Querying works just as it used to. I believe that the issue can now finally be closed.Note: I am on chromadb v0.4.2 and langchain v0.0.237.
Hi everyone, https://github.com/hwchase17/langchain/releases/tag/v0.0.236 was just released this morning and is backwards compatible. @kevinknights29 @Maipengfei please upgrade and if you have any issues, please ask in our discord - https://discord.gg/MMeYNTmh3x - we are responding very quickly right now!
@aevedis
vector_db = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=output_dir)
should now bedb = vector_db.from_documents(documents=chunks, embedding=embeddings, persist_directory=output_dir)
instead, otherwise you are just overwriting the vector_db variable. also vector_db.persist() is no longer needed and has been removed. It should be implicitly called when you call from_documents.@rtomaf, im sorry im not able to reproduce this.
Thank you!!!
Well, this was crazy ride, but the final answer is :
langchain 0.0.228 chromadb 0.4.15 has the issue with “deprecated old config” vs langchain 0.0.325 chromadb 0.4.15
does not have any issues.
So the issue was langchain all along even do the error is coming from chroma. I guess implementation in in 228 had the problem.
Its unclear to me why pip install langchain (maybe early in a week when I re-created virtualenv would pull 0.228 version from July instead of any of the Oct versions).
nonetheless
Thank you for the time. The imports of version lines were the most critical. guess next time I’ll start there first. pip install langchain --upgrade
solves the issue and
now works without a problem in a same-way as deeplearning.ai tutorial.
@victorlee0505
Thank you so much… It’s working now… ^_^ PrivateGPT’s ingest.py .
I blieve you should be able provide a pull request over there. At least one more file to be modified: privateGPT.py accordingly…
Thanks…
Hey everyone, I guess this will help https://docs.trychroma.com/migration Check out the end of this page, you’ll find the reason for the error here.
Edit: Adding more context here: With version chroma 0.4.4 there has been some changes with respect to how .persist() and .reset() work. Verify how you are creating your client for the vector db
That fixed the error for me.
Yes, your code also works in my environment. I must have got something wrong. Thanks!
@jeffchuber, Thanks for the heads-up!
I also meet this problem, so I can only installed the older version(chromadb==0.3.21) to solve that.
—Original— From: "Andrey E. @.> Date: Wed, Jul 19, 2023 23:29 PM To: @.>; Cc: @.@.>; Subject: Re: [hwchase17/langchain] ChromaDB 0.4+ is no longer compatible withclient config (Issue #7887)
db = vector_db.from_documents(documents=chunks, embedding=embeddings, persist_directory=output_dir)
Eh, but it still does not work. I get TypeError: ‘NoneType’ object is not iterable .
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Answer generated by a 🤖
Answer
I understand that you’re encountering an issue with ChromaDB 0.4+ compatibility in the LangChain Python framework. The error message you’re receiving suggests that the ChromaDB configuration you’re using is deprecated and needs to be upgraded. You’ve found a workaround by removing the line
chroma_db_impl="duckdb+parquet"
fromlangchain.vectorstores/chroma.py
, but this prevents you from using or migrating the earlier database.To address this issue, one potential solution could be to use a conditional statement to check the version of ChromaDB being used and then apply the appropriate configuration. Here’s how you could implement this:
In this code,
"new_configuration"
should be replaced with the appropriate configuration for ChromaDB 0.4+.This solution should allow you to use ChromaDB 0.4+ in a backwards compatible way. However, please note that you may still need to handle the migration process as suggested by the error message. This might involve using the
chroma-migrate
tool or a similar method to upgrade your existing data to the new configuration.This is just one potential solution and it might need to be adjusted based on your specific code and requirements. If you find this solution helpful and believe it’s not covered by existing documentation, I encourage you to make a pull request to update the documentation for the benefit of other users.
Sources
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.