haystack: FAISSDocumentStore: number of documents present in the SQL database does not match the number of embeddings in FAISS
Describe the bug
After I created FAISSDocumentStore, it could not recreate FAISSDocumentStore, because it reports “documents are not presented in SQL database” error.
Error message
ValueError: The number of documents present in the SQL database does not match the number of embeddings in FAISS. Make sure your FAISS configuration file correctly points to the same database that was used when creating the original index.
Expected behavior Loaded saved documents in SQL DB and created FaissDocumentStore object.
Additional context None
To Reproduce
# Try this code twice! not once.
import argparse
from haystack.document_stores import FAISSDocumentStore
from haystack.nodes import EmbeddingRetriever
from haystack.pipelines import DocumentSearchPipeline
def load_data():
return [{'content': 'test sent 1'},
{'content': 'test sent 2'},
{'content': 'test sent 3'}]
def build_pipeline(configs):
dicts = load_data()
document_store = FAISSDocumentStore(vector_dim=configs.hidden_dims, faiss_index_factory_str=configs.faiss_index)
retriever = EmbeddingRetriever(document_store=document_store,
embedding_model=configs.embedding_model,
model_format=configs.model_format)
document_store.write_documents(dicts)
document_store.update_embeddings(retriever)
return DocumentSearchPipeline(retriever)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--hidden_dims', default=768)
parser.add_argument('--faiss_index', default='Flat')
parser.add_argument('--embedding_model', default='distilbert-base-uncased')
parser.add_argument('--model_format', default='transformers')
configs = parser.parse_args()
pipe = build_pipeline(configs)
print(pipe.run('test sent 1'))
FAQ Check
- Have you had a look at our new FAQ page?
System:
- OS: MacOS
- GPU/CPU: CPU
- Haystack version (commit or version number): 1.0.0
- DocumentStore: FAISSDocumentStore
- Reader: X
- Retriever: EmbeddingRetriever
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (8 by maintainers)
Hey @Taekyoon, FAISSDocumentStore is a little special here. Under the hood it contains a sql database and a FAISS index. The sql database needs to be backed by file storage all the time, so it is stored automatically. The FAISS index on the other hand lives entirely in-memory and there is no automatic saving to disk taking place. That’s why FAISSDocumentStore has the
save()
method. This will write the FAISS index to disk. Furthermore you also have to load the previously stored FAISS index explicitly again. This works by calling theload()
class method or passingfaiss_index_path
to the constructor. So you would basically first have to check whether there is a stored FAISS index on disk and decide if you want to create a new FAISSDocumentStore or use an existing one like:@bogdankostic how do you do that? When creating a faiss document store, a ‘faiss_document_store.db’ file is created - how can I create a different one? TIA
Done deepset-ai/haystack-tutorials#38
Sounds good! let me try 😃 Thanks!
Sorry, I couldn’t get a message from the comment above. @tstadel I was thinking about the function or option to remove sql file while remove FAISSDocumentStore object, so that we don’t have to consider about “Is the db file exist?” before recreate FAISSDocumentStore again.