langchain: [BUG] Chroma DB - similarity_search - chromadb.errors.NotEnoughElementsException

Within the Chroma DB, similarity_search has a default “k” of 4. However, if there are less than 4 results to return to the query, it will crash instead of returning whatever is available.

Run similarity search with Chroma.

Parameters
query (str) – Query text to search for.

k (int) – Number of results to return. Defaults to 4.

filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.

Returns
List of documents most simmilar to the query text.

Return type
List[Document]
  raise NotEnoughElementsException(
chromadb.errors.NotEnoughElementsException: Number of requested results 4 cannot be greater than number of elements in index 1

I believe a check is needed to return whatever is available whenever the results are < k.

Edit: You can set k=1 for the search, but then you are artificially limiting yourself, without knowing what is available. It really should check if the N results returned are <= K, and if so, instead of crashing it should return whatever is availab.e.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (2 by maintainers)

Commits related to this issue

Most upvoted comments

Hi everyone, we are tracking this issue over on Chroma’s end https://github.com/chroma-core/chroma/issues/225

hope to have it fixed up soon!

Reduce the chunk size or create the big document, 4 chunks must be required

@laveshnk-crypto smaller chunks will create more documents - same thing.

@geg00 Until this is fixed, you an simply pass the “k=1” expclitily:

print(file_db.similarity_search_by_vector(vector1, k=1))

I’ve just opted for not using a vector store if my text takes up less than 4 docs. If the text is split and makes up 4 or more docs, I use chroma vector store. Ideally this should be handled behind the scenes as this bug gave me some headache to figure out.