chroma: [Bug]: Unable to add document to a collection when using TransformersEmbeddingFunction

What happened?

When I am trying to use TransformersEmbeddingFunction as embedder the document is not added to the chromadb. I am using a chromadb Docker image.

` const embedder = new TransformersEmbeddingFunction({ model: ‘Xenova/all-MiniLM-L6-v2’ });

const collection = await client.getOrCreateCollection({ name: “test_collection”, embeddingFunction: embedder, metadata: { “description”: “testing” } });

let addResponse = await collection.add({ ids: […ids], metadatas: […metadatas], documents: […documents] });

`

and I got an internal server error from the Chroma Docker image “POST /api/v1/collections/0417fbea-6b90-44b2-aa7b-584108d3a4ab/add HTTP/1.1” 500"

and the document is not added

While everything will work fine if I use OpenAI text-embedding-ada-002

can you please support me?

Versions

Chroma 3.9 Node.js v18.14.0 Transformers.js

Relevant log output

Document adding response:

{ error: Response { [Symbol(realm)]: null, [Symbol(state)]: { aborted: false, rangeRequested: false, timingAllowPassed: true, requestIncludesCredentials: true, type: 'default', status: 500, timingInfo: [Object], cacheState: '', statusText: 'Internal Server Error', headersList: [HeadersList], urlList: [Array], body: [Object] }, [Symbol(headers)]: HeadersList { [Symbol(headers map)]: [Map], [Symbol(headers map sorted)]: null } } }

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

@mohannadelqadi one thing to observe is that all models have input token limits, so if your docs are too long, as per the model card, chunk your texts - see a python example here https://gist.github.com/tazarov/e66c1d3ae298c424dc4ffc8a9a916a4a (given that Lanchain works in js too, the python example should be easy to adapt)

@mohannadelqadi thanks for this. I’ll reproduce and let you know next steps