usearch: Bug: Slow index.add
Describe the bug
Index creation, vector addition and search loop is 90x slower than identical loop for faiss.
Reproduced in Google colab.
Steps to reproduce
import faiss
from usearch.index import Index
import numpy as np
# Set the seed for reproducibility
np.random.seed(42)
# Generate 500 random vectors of length 1024 with values between 0 and 1
vectors = np.random.rand(500, 1024)
vector=np.random.rand(1024)
%%timeit
# FAISS
indexf = faiss.IndexFlatL2(vectors.shape[-1])
indexf.add(np.array(vectors).astype(np.float32))
D, I = indexf.search(np.array([vector]), 50)
# 1.14 ms ± 4.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
index = Index(
ndim=vectors.shape[-1], # Define the number of dimensions in input vectors
metric='l2_sq', # Choose 'cos', 'l2sq', 'haversine' or other metric, default = 'ip'
)
index.add(labels= np.arange(len(vectors)), vectors=vectors)
matches, distances, count = index.search(vector, 50, exact=True)
# 94.7 ms ± 20.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Expected behavior
n/a
USearch version
latest pip
Operating System
ubuntu
Hardware architecture
x86
Which interface are you using?
Python bindings
Contact Details
No response
Is there an existing issue for this?
- I have searched the existing issues
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 18 (11 by maintainers)
Commits related to this issue
- Add: Exact search shortcut Closes #176 — committed to ashvardanian/usearch by ashvardanian a year ago
- Build: Released 0.23.0 [skip ci] # [0.23.0](https://github.com/unum-cloud/usearch/compare/v0.22.3...v0.23.0) (2023-08-05) ### Add * `Matches` and `BatchMatches` simple API ([1b40f13](https://github... — committed to unum-cloud/usearch by semantic-release-bot a year ago
Thanks for looking into it.
This is the typical retrieval scenario in web search when LLMs are used.
I am passing exact=True for search as you see in my test code.
I can indeed confirm vastly better performance on my setup (by a factor 20). I can only say fantastic work! 🚀
Question: How does one get the distances?