chroma: [Bug]: Unable to add document to a collection when using TransformersEmbeddingFunction
What happened?
When I am trying to use TransformersEmbeddingFunction as embedder the document is not added to the chromadb. I am using a chromadb Docker image.
` const embedder = new TransformersEmbeddingFunction({ model: ‘Xenova/all-MiniLM-L6-v2’ });
const collection = await client.getOrCreateCollection({ name: “test_collection”, embeddingFunction: embedder, metadata: { “description”: “testing” } });
let addResponse = await collection.add({ ids: […ids], metadatas: […metadatas], documents: […documents] });
`
and I got an internal server error from the Chroma Docker image “POST /api/v1/collections/0417fbea-6b90-44b2-aa7b-584108d3a4ab/add HTTP/1.1” 500"
and the document is not added
While everything will work fine if I use OpenAI text-embedding-ada-002
can you please support me?
Versions
Chroma 3.9 Node.js v18.14.0 Transformers.js
Relevant log output
Document adding response:
{ error: Response { [Symbol(realm)]: null, [Symbol(state)]: { aborted: false, rangeRequested: false, timingAllowPassed: true, requestIncludesCredentials: true, type: 'default', status: 500, timingInfo: [Object], cacheState: '', statusText: 'Internal Server Error', headersList: [HeadersList], urlList: [Array], body: [Object] }, [Symbol(headers)]: HeadersList { [Symbol(headers map)]: [Map], [Symbol(headers map sorted)]: null } } }
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (6 by maintainers)
@mohannadelqadi one thing to observe is that all models have input token limits, so if your docs are too long, as per the model card, chunk your texts - see a python example here https://gist.github.com/tazarov/e66c1d3ae298c424dc4ffc8a9a916a4a (given that Lanchain works in js too, the python example should be easy to adapt)
@mohannadelqadi thanks for this. I’ll reproduce and let you know next steps