LLamaSharp: Kernel Memory is broken with latest nugets
Using the 0.8 release of LlamaSharp and Kernal-Memory with the samples there is an error because the LlamaSharpTextEmbeddingGeneration doesn’t implement the Attributes property.
I took the source and created my own and added this:
public IReadOnlyDictionary<string, string> Attributes => new Dictionary<string, string>();
So it wouldn’t error.
But no matter what model I use I get “INFO NOT FOUND.” (I’ve tried kai-7b-instruct.Q5_K_M.gguf, llama-2-7b-32k-instruct.Q6_K.gguf, llama-2-7b-chat.Q6_K.gguf and a few others)
I’ve tried loading just text, an html file, and a web page to no avail.
About this issue
- Original URL
- State: open
- Created 7 months ago
- Comments: 34
Update: LLamaSharp 0.8.1 is now integrated into KernelMemory, here’s an example: https://github.com/microsoft/kernel-memory/blob/main/examples/105-dotnet-serverless-llamasharp/Program.cs
There’s probably some work to do for users, e.g. customizing prompts for LLama and identifying which model works best. KM should be sufficiently configurable to allow that.
KernelMemory author here, let me know if there’s something I can do to make the integration better, more powerful, easier, etc 😃
Thanks for the feedback, we merged a PR today that allows to configure and/or replace the search logic, e.g. defining token limits.
And this PR https://github.com/microsoft/kernel-memory/pull/189 allows to customize token settings and tokenization logic. Would appreciate if someone could take a look/let us know if it helps.
This snippet shows how we could add LLama to KernelMemory:
dluc: Generally I see this when:
Generally what you want is the context_length to be the same as the model’s length. And you want max_tokens to either be short but long enough for the answer you’re expecting because LLAMA has a bad habit of repeating itself, or System Message + all user messages + assistant responses including max_tokens <= context_length.
I use TikSharp to calculate the number of tokens for all prompts, add 10 or so just to be safe and subtract that from context_length and make that the max token length, then set antiprompts to AntiPrompts = [“\n\n\n\n”, “\t\t\t\t”] which gets rid of 2 of the cases (especially when generating json with the grammar file) of repetition instead of ending.
This technique also works when using ChatGPT 3.5+ so you don’t get errors since it hard refuses and costs you money, to produce more than the context_length so you have to do this math or risk it blowing up and running up your bill.
This can only be resolved from the kernel memory . I have already submitted an issue https://github.com/microsoft/kernel-memory/issues/164 and waiting for further updates.
👍 Good idea, I seem to have a solution to the issue #289. 😃 Thank you for your suggestion.