chroma: [Bug]: Upsert with Duplicated IDs fails
What happened?
Tried to upsert with duplicate IDs
Versions
Chroma v0.3.23 Python 3.10.2
Relevant log output
File "/root/modal_cluster.py", line 72, in create_embeddings
collection.upsert(
File "/usr/local/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 295, in upsert
ids, embeddings, metadatas, documents = self._validate_embedding_set(
File "/usr/local/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 347, in _validate_embedding_set
ids = validate_ids(maybe_cast_one_to_many(ids))
File "/usr/local/lib/python3.10/site-packages/chromadb/api/types.py", line 117, in validate_ids
raise errors.DuplicateIDError(
chromadb.errors.DuplicateIDError: Expected IDs to be unique, found duplicates for: {....}
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (7 by maintainers)
@shaktiman101 @gururise thanks for the response here. I think what is likely happening is
max_query_sizeis being hit inside clickhouse. We are removing clickhouse tomorrow (and providing a seamless upgrade path) and all of this should go away. Sorry for the troubles.I’ve also experience this using python and with a large dataset (without realizing it was a bug)
For me when I try to do ‘upsert’ on < 7-8 documents (each document about 1500 tokens) it works, but it fails for anything more than that.
This is what I get when trying to insert more then 7-8 docuemnts: “”“requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/v1/collections/16bed3bb-10ab-4e55-9f21-dadd6ef214dd/upsert”“”
I’m using chroma client-server architecture & this is how I instantiated the chroma client:
chroma_client = chromadb.Client(Settings(chroma_api_impl="rest", chroma_server_host="localhost", chroma_server_http_port=8000 ))Tough to reproduce, error only happened with >15k inputs