langchainjs: LlamaCppEmbeddings does not work

Currently, we are employing llamacpp via langchain as documented at https://js.langchain.com/docs/integrations/llms/llama_cpp.

(version: 0.0.204)

It looks like that the embeddings via LlamaCppEmbeddings class (located at: langchain/src/embeddings/llama_cpp.ts) still do not work for us, as the resulted vector’s length is quite randomized, for instance: [ 1824, 349, 418, 274, 1402, 28709, 334, 8236, 28828, 28804 ]

It is noted that, the different gguf models such as llama-2 or zephyr-ß result the same thing.

Your tips will help us in this case much.

Thank you in advance!

About this issue

  • Original URL
  • State: open
  • Created 7 months ago
  • Comments: 19 (6 by maintainers)

Most upvoted comments

I haven’t added support for embedding in node-llama-cpp yet, but it is planned for the next version that I’m currently working on.

The current implementation for this on Langhcain is wrong and won’t work.

I’ve opened an issue for the embedding feature on node-llama-cpp so you can track its progress (https://github.com/withcatai/node-llama-cpp/issues/123). I’ll also update this issue after I implement the embedding support so we can fix the code on Langchain.

🤖

Hello @lanluu! 🖐️

I’m Dosu, a friendly bot here to help you out while we wait for a human maintainer. I specialize in solving bugs, answering questions, and even guiding you on your journey to becoming a contributor.

Remember, I’m here to assist you. Let’s work together to sort this out! 😊

The vector length appears to be randomized because the LlamaCppEmbeddings class in the langchainjs framework generates embeddings based on the input text. The length of the resulting vector is dependent on the number of tokens in the input text, not on the specific model used (like llama-2 or zephyr-ß).

In the embedDocuments and embedQuery methods, the input text is tokenized using the _context.encode(text) function. This function breaks down the input text into tokens, which are then converted into numerical representations. The number of tokens, and thus the length of the resulting vector, depends on the content and structure of the input text.

Here is the relevant code:

const encodings = await this.caller.call(
  () =>
    new Promise((resolve) => {
      resolve(this._context.encode(text));
    })
);

for (let i = 0; i < encodings.length; i += 1) {
  const token: number = +encodings[i];
  tokens.push(token);
}

In this code, encodings.length determines the length of the resulting vector. This length is not fixed and will vary depending on the input text.

If you need more control over the length of the resulting vector, you might consider implementing a padding mechanism to ensure a consistent vector length, or truncating the vector to a fixed length. However, please note that both of these approaches may impact the quality of the embeddings.

For more details, you can refer to the source code of the LlamaCppEmbeddings class in the langchainjs repository.

I hope this helps! If you have any more questions, feel free to ask.

Best regards, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

I also used node-llama-cpp (3.0.0-bet.14) earlier, but found a new error: file:///Users/panjunlin/work/github/chat-mind/node_modules/node-llama-cpp/dist/evaluator/LlamaModel.js:22 constructor({ modelPath, gpuLayers, vocabOnly, useMmap, useMlock, onLoadProgress, loadSignal }, { _llama }) { ^

TypeError: Cannot destructure property ‘_llama’ of ‘undefined’ as it is undefined. at LlamaModel (/Users/panjunlin/work/github/chat-mind/node_modules/node-llama-cpp/src/evaluator/LlamaModel.ts:63:9)

@lanluu I predict it’ll take about ~1 month to finish the major features that contain breaking changes for the v3 beta, so the stable release would come a bit after that

this is not a bug. the context encode and decode functions are meant to turn text into the tokens actually sent to the llm (it is not sent text), so they will vary with text input length. if your example above happened to match, it’s because one or more words used more than one token. it’s not 1:1. encode is good for determining if your input text will exceed the context length of the model. for embedding vectors, you will want to use a model designed for generating semantic search vectors. you can get these from Huggingface. in JS you can do something like:

import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";
import { env } from '@xenova/transformers';

const modelName = "Xenova/all-MiniLM-L12-v2";
const model = new HuggingFaceTransformersEmbeddings({ modelName });
const prompt = "query or prompt to transform into a semantic vector";
const vector = model.embedDocuments([ prompt ]);

in the case of the Xenova model above, you will always get a vector of dimension 384. then for each of your document chunks being stored in a vector db, you will use the same embed call. that way vectors are comparable. the vector db can take it from there.

@giladgd thanks for the update, I’ll await the node-llama-cpp v3 release then update accordingly.

@awijshoff correct I am just passing the results from node-llama-cpp thru. I’ve started a discussion withnode-llama-cpp’s maintainer on where the fix is best applied.

I think #3509 is related to this. When changing the order of the text (moving “Bye bye” to the first position) HNSWLib returns

this.index.addPoint(vectors[i], docstoreSize + i);
                       ^

Error: Invalid the given array length (expected 4, but got 2).

HNSWLib internally sets this.args.numDimensions to the length of the embeddings of the first element of the passed elements.

    async initIndex(vectors) {
        if (!this._index) {
            if (this.args.numDimensions === undefined) {
                this.args.numDimensions = vectors[0].length;
            }

therefore this never fails for the first element, but in this case fails for the second with an array-lenght of 2. I think other vectorStores work very similar, for example in Faiss:

const dv = vectors[0].length;
        if (!this._index) {
            const { IndexFlatL2 } = await FaissStore.importFaiss();
            this._index = new IndexFlatL2(dv);
        }
        const d = this.index.getDimension();
        if (dv !== d) {
            throw new Error(`Vectors must have the same length as the number of dimensions (${d})`);
        }

Therefore I assume that the vectors/embeddings have to be of equal length for this to work? But that is just my wild guess.

This article supports my theory that every vector/embedding has to have the same length (MongoDB article).

If that is indeed the case, the issue seems to be in node-llama-cpp, because in the following code-example (using node-llama-cpp directly), the val1 val2 and val3 are arrays of unequal length, whereas they should have been arrays of equal lenght - the dimension of the model?

import { LlamaModel, LlamaContext, LlamaChatSession } from "node-llama-cpp";

const llamaPath = "models/mistral-7b-v0.1.Q5_K_M.gguf"; //english

const model = new LlamaModel({
  modelPath: llamaPath,
});

const context = new LlamaContext({ model });

const val1 = context.encode("hello nice world");
const val2 = context.encode("Hello world");
const val3 = context.encode("Bye bye");

Ok got it so the error is the dimension size not the specific array length.