LocalAI: LLama backend embeddings not working
Discussed in https://github.com/mudler/LocalAI/discussions/1615
<div type='discussions-op-text'>Originally posted by fdewes January 20, 2024 Hi everyone,
first of all: thank you for this great piece of software š
I am trying to create embeddings with gguf models, such as Phi or Mistral. However, all my attempts to serve them via LocalAI and the LLama backend fail. I am using the localai/localai:v2.5.1-cublas-cuda12 docker image
Here is the http error response:
InternalServerError: Error code: 500 - {'error': {'code': 500, 'message': 'rpc error: code = Unimplemented desc = ', 'type': ''}}
This is the debug output from the localai docker image:
11:25AM DBG Request received:
11:25AM DBG Parameter Config: &{PredictionOptions:{Model:phi-2.Q8_0.gguf Language: N:0 TopP:0 TopK:0 Temperature:0 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:phi-embeddings F16:false Threads:4 Debug:true Roles:map[] Embeddings:true Backend:llama TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[Please create python script for doing basic data science on entire pandas df] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
11:25AM INF Loading model 'phi-2.Q8_0.gguf' with backend llama
11:25AM DBG llama-cpp is an alias of llama-cpp
11:25AM DBG Model already loaded in memory: phi-2.Q8_0.gguf
[10.10.1.20]:62886 500 - POST /embeddings
This is the configuration file for the model.
https://raw.githubusercontent.com/fdewes/model_gallery/main/phi_embeddings.yaml
name: "phi-embeddings"
license: "Apache 2.0"
urls:
- https://huggingface.co/TheBloke/phi-2-GGUF
description: |
Phi model that can be used for embeddings
config_file: |
parameters:
model: phi-2.Q8_0.gguf
backend: llama
embeddings: true
files:
- filename: "phi-2.Q8_0.gguf"
sha256: "26a44c5a2bc22f33a1271cdf1accb689028141a6cb12e97671740a9803d23c63"
uri: "https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q8_0.gguf"
Am i doing anything wrong or is this a bug? With the same models, I can create embeddings locally by using the llama-cpp-python bindings without problems. Any help solving this problem would be greatly appreciated.
</div>About this issue
- Original URL
- State: open
- Created 5 months ago
- Reactions: 1
- Comments: 15
It doesnāt download them on startup, it downloads them on the first time itās invoked given a specific model of interest.
Also your information there (to models) doesnāt download the embedding models there. Iām a bit unsure why, and am still looking into that myself. There would be the need for 2 -v mounts.
Gotcha, yeah Iām not sure on that document. I can give you some code, below, of how I handled it:
The above uses LangChain to handle the embeddings. The more important part is the embedding database, which is āsentence-t5-largeā (https://huggingface.co/sentence-transformers/sentence-t5-large) which is used for embeddings. My accuracy isnāt quite where I want to be with it in that vector space, but it does call the embeddings and saves to the database.
Iāve run into similar situations using models that werenāt really good with embeddings. Iām checking https://huggingface.co/models?other=embeddings&sort=trending and https://huggingface.co/spaces/mteb/leaderboard
Are you sure that this model can do embeddings? Iām not seeing anything on either of the above links, nor on TheBlokeās page, nor on the Microsoft Phi page or any google searches.