langchain: Kernel dies for SVMRetriever with Huggingface Embeddings Model
System Info
langchain==0.0.245 python==3.9
Who can help?
Information
- The official example notebooks/scripts
- My own modified scripts
Related Components
- LLMs/Chat Models
- Embedding Models
- Prompts / Prompt Templates / Prompt Selectors
- Output Parsers
- Document Loaders
- Vector Stores / Retrievers
- Memory
- Agents / Agent Executors
- Tools / Toolkits
- Chains
- Callbacks/Tracing
- Async
Reproduction
I am trying to use HuggingFace Embeddings model for SVMRetriever method as follows:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.retrievers import SVMRetriever
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
retriever = SVMRetriever.from_texts(["foo", "bar", "world", "hello", "foo bar"],
embedding,
k=4, relevancy_threshold=.25)
Kernel looks busy for a while and then dies when I run above snippet.
Expected behavior
I expect to run this smoothly and upon completion, should be able to retrieve similar documents as:
result = retriever.get_relevant_documents("foo")
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16
🤖
Sorry, I don’t know the answer to that.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
🤖
Sure, I can help with that. Here’s how you can use the
SVMRetriever
with your document embeddings:Once you have the
svm_retriever
instance, you can use it to retrieve similar documents. However, please note that the_get_relevant_documents
method is a private method, which means it’s not intended to be directly accessed outside of the class. It’s usually better to use public methods for this purpose. But based on the provided context, it seems like this is the method you need to use to retrieve similar documents.Here’s how you can use it:
This will return a list of similar documents to your query.
Please note that the
from_texts
method ofSVMRetriever
takes two parameters:texts
andembeddings
. Thetexts
parameter is a list of your original documents, and theembeddings
parameter is the corresponding document embeddings.I hope this helps! Let me know if you have any other questions.
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
🤖
Hello,
Thank you for reaching out with your question. Based on the context provided, it seems you want to use the
HuggingFaceEmbeddings
class in LangChain with thefeature-extraction
task without using the HuggingFaceHub API.The
HuggingFaceEmbeddings
class in LangChain uses thesentence_transformers
package to compute embeddings. This package is specifically designed for computing sentence or document embeddings and does not support all models and tasks available in thetransformers
package. In particular, it does not support thefeature-extraction
task.However, you can still compute embeddings for your documents and queries using the
HuggingFaceEmbeddings
class and then pass these embeddings to theSVMRetriever
method. Here is an example of how you can do this:Please note that the
SVMRetriever
method is not shown in the provided context, so you’ll need to refer to the LangChain documentation or source code for the exact usage.Regarding the
HuggingFaceHubEmbeddings
class, it does support thefeature-extraction
task as shown in theVALID_TASKS
tuple and thetask
parameter of theHuggingFaceHubEmbeddings
class. However, this class uses the HuggingFaceHub API, which you mentioned you want to avoid.I hope this helps! If you have any further questions, please don’t hesitate to ask.
Best, Dosu
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
🤖
Hello,
Thank you for reaching out and providing detailed information about your issue. Based on the information you’ve provided, it seems like your kernel is dying when trying to use the HuggingFace Embeddings model with the SVMRetriever method in LangChain.
There could be several reasons for this:
Unsupported Model: The HuggingFace model you’re trying to use might not be supported. As per the LangChain code, only models that start with “sentence-transformers” are supported. If you’re using a different model, it might cause the kernel to crash.
Unsupported Task: The task you’re trying to perform might not be supported. The only valid task as per the LangChain code is “feature-extraction”. If you’re trying to perform a different task, it might cause the kernel to crash.
Missing Dependencies: The HuggingFace Hub Python package or the sentence_transformers python package might not be installed. If either of these packages is not installed, an ImportError will be raised, which might cause the kernel to crash.
Missing API Token: The HuggingFace API token might not be set. If the API token is not set, the InferenceApi cannot be initialized, which might cause the kernel to crash.
To troubleshoot this issue, I would recommend the following steps:
Check the Model: Ensure that the model you’re using starts with “sentence-transformers”. If not, try using a model that does.
Check the Task: Ensure that the task you’re trying to perform is “feature-extraction”. If not, try performing this task instead.
Check Dependencies: Ensure that the HuggingFace Hub Python package and the sentence_transformers python package are installed. You can install them using pip:
Check API Token: Ensure that the HuggingFace API token is set. You can set it as an environment variable
HUGGINGFACEHUB_API_TOKEN
.If you’ve checked all of these and the issue still persists, it would be helpful if you could provide more information about your environment, such as the operating system you’re using, the amount of RAM available, and any error messages you’re seeing when the kernel dies.
I hope this helps! Let me know if you have any other questions or if there’s anything else I can do to assist you.
Best, Dosu
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
I faced the same issue using
HuggingFaceEmbeddings
andSVMRetriever
. In my case the issue was withcreate_index
insvm.py
The Threadpool executer was somehow failing on
executor.map(embeddings.embed_query, contexts)
so I made my own without a Threadpool ( I know I know, yet to implement multithreading on it)But the idea is to build index yourself so you can just do
@AjinkyaBankar Got it. Not sure what the issue is then 🤷‍♂️ The only suspect lines in SVMRetriever code is this coroutine code