langchain: SupabaseVectorStore.from_documents is not working

System Info

langchain==0.0.300 supabase==1.1.1

Who can help?

@hwaking @eyurtsev @agola11 @eyurtsev @hwchase17 @agola11

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

Creation of Supabase client

supabase_url: str = os.environ.get("SUPABASE_URL") # type: ignore supabase_key: str = os.environ.get("SUPABASE_SERVICE_KEY") # type: ignore supabase_client = create_client(supabase_url, supabase_key)

Text Splitter creation

text_splitter = CharacterTextSplitter( chunk_size=800, chunk_overlap=0, )

Embeddings

embeddings = OpenAIEmbeddings()

Loading the document

loader = PyPDFLoader("Alice_in_wonderland2.pdf") pages = loader.load_and_split() docs = text_splitter.split_documents(pages)

Save values to Supabase

vector_store = SupabaseVectorStore.from_documents(documents=docs, embedding=embeddings, client=supabase_client)

Error encountring

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File “D:\VSCode\Python\langchain project\supabase-try\test.py”, line 34, in <module> vector_store = SupabaseVectorStore.from_documents( File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\langchain\vectorstores\base.py”, line 417, in from_documents
return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs) File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\langchain\vectorstores\supabase.py”, line 147, in from_texts
cls._add_vectors(client, table_name, embeddings, docs, ids) File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\langchain\vectorstores\supabase.py”, line 323, in add_vectors
result = client.from
(table_name).upsert(chunk).execute() # type: ignore File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\postgrest_sync\request_builder.py”, line 57, in execute r = self.session.request( File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_client.py”, line 814, in request return self.send(request, auth=auth, follow_redirects=follow_redirects) File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_client.py”, line 901, in send response = self._send_handling_auth( File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_client.py”, line 929, in _send_handling_auth response = self._send_handling_redirects( File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_client.py”, line 966, in _send_handling_redirects response = self._send_single_request(request) File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_client.py”, line 1002, in _send_single_request response = transport.handle_request(request) File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_transports\default.py”, line 218, in handle_request resp = self._pool.handle_request(req) File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\contextlib.py”, line 135, in exit self.gen.throw(type, value, traceback) File “C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_transports\default.py”, line 77, in map_httpcore_exceptions raise mapped_exc(message) from exc httpx.WriteTimeout: The write operation timed out

I tried changing the code according to the langchain docs as

vector_store = SupabaseVectorStore.from_documents( docs, embeddings, client=supabase_client, table_name="documents", query_name="match_documents", )

Then I encountered the following error

2023-10-05 10:33:29,879:INFO - HTTP Request: POST https://scptrclvtrvcwjdunlrn.supabase.co/rest/v1/documents “HTTP/1.1 404 Not Found” Traceback (most recent call last): File “D:\VSCode\Python\langchain project\supabase-try\test.py”, line 34, in <module> vector_store = SupabaseVectorStore.from_documents(

I didnt create the document table in the supabase manually as i need it to be created automatically with the code. And if i need to create it manually i need to know the steps of create that as well and how to integrate it as well. Please help me immediately

Expected behavior

SupabaseVectorStore.from_documents works fine and Store all the embeddings in the vector store.

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Comments: 28 (14 by maintainers)

Commits related to this issue

Most upvoted comments

I encountered the same WriteTimeout issue with SupabaseVectorStore.from_documents. Reducing the chunk size didn’t help. However, setting postgrest_client_timeout to None in ClientOptions while creating the Supabase client resolved it. Here’s the snippet:

from supabase.lib.client_options import ClientOptions
client_options = ClientOptions(postgrest_client_timeout=None)
supabase = create_client(supabase_url, supabase_key, options=client_options)

Hope it helps

🤖

Hello,

Thank you for providing detailed information about the issue you’re facing. Based on the context provided, there are a few potential solutions that might help resolve the issue.

Firstly, it seems like you’re using an older version of LangChain (langchain==0.0.300). The latest version is v0.0.308. It’s possible that the issue you’re facing has been resolved in a newer version. I would recommend updating LangChain to the latest version and see if the issue persists.

Secondly, the error message httpx.WriteTimeout: The write operation timed out suggests that the data you’re trying to write is too large, causing the operation to timeout. You could try reducing the size of the data you’re writing to the database. This can be done by modifying the chunk size in the _add_vectors method of the SupabaseVectorStore class. Here’s how you can do it:

# According to the SupabaseVectorStore JS implementation, the best chunk size
# is 500
chunk_size = 500  # You can modify this value to a smaller number
id_list: List[str] = []
for i in range(0, len(rows), chunk_size):
    chunk = rows[i : i + chunk_size]

    result = client.from_(table_name).upsert(chunk).execute()  # type: ignore

    if len(result.data) == 0:
        raise Exception("Error inserting: No rows added")

    # VectorStore.add_vectors returns ids as strings
    ids = [str(i.get("id")) for i in result.data if i.get("id")]

    id_list.extend(ids)

By reducing the chunk_size value, you can decrease the amount of data processed at once, which might help avoid the timeout error. However, please note that this will also increase the number of requests made to the Supabase server, which might affect performance. This solution was suggested in a similar issue #7823.

Lastly, the error message HTTP/1.1 404 Not Found suggests that the table you’re trying to write to does not exist in your Supabase database. Please ensure that the table name you’re trying to write to exists in your Supabase database. If the table doesn’t exist, it could lead to a 404 Not Found error. In your case, the table name should be “documents”.

I hope this helps! If you’re still facing issues, please provide more details about your implementation and the exact error message you’re receiving.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

followed up in the pr with the langchain guys to get there opinion on the rename