usearch: Bug: Slow index.add

Describe the bug

Index creation, vector addition and search loop is 90x slower than identical loop for faiss.

Reproduced in Google colab.

Steps to reproduce

import faiss
from usearch.index import Index

import numpy as np

# Set the seed for reproducibility
np.random.seed(42)

# Generate 500 random vectors of length 1024 with values between 0 and 1
vectors = np.random.rand(500, 1024)
vector=np.random.rand(1024)

%%timeit
# FAISS

indexf = faiss.IndexFlatL2(vectors.shape[-1])
indexf.add(np.array(vectors).astype(np.float32))
D, I = indexf.search(np.array([vector]), 50)
# 1.14 ms ± 4.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


%%timeit

index = Index(
    ndim=vectors.shape[-1], # Define the number of dimensions in input vectors
    metric='l2_sq', # Choose 'cos', 'l2sq', 'haversine' or other metric, default = 'ip'
  
   )
index.add(labels= np.arange(len(vectors)), vectors=vectors)
matches, distances, count = index.search(vector, 50, exact=True)
  
# 94.7 ms ± 20.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Expected behavior

n/a

USearch version

latest pip

Operating System

ubuntu

Hardware architecture

x86

Which interface are you using?

Python bindings

Contact Details

No response

Is there an existing issue for this?

I have searched the existing issues

Code of Conduct

I agree to follow this project’s Code of Conduct

About this issue

Original URL
State: closed
Created a year ago
Comments: 18 (11 by maintainers)

Commits related to this issue

Add: Exact search shortcut Closes #176 — committed to ashvardanian/usearch by ashvardanian a year ago
Build: Released 0.23.0 [skip ci] # [0.23.0](https://github.com/unum-cloud/usearch/compare/v0.22.3...v0.23.0) (2023-08-05) ### Add * `Matches` and `BatchMatches` simple API ([1b40f13](https://github... — committed to unum-cloud/usearch by semantic-release-bot a year ago

Most upvoted comments

Thanks for looking into it.

512-1024
l2sq
Python

This is the typical retrieval scenario in web search when LLMs are used.

I am passing exact=True for search as you see in my test code.

vprelovac on Aug 1, 2023

I can indeed confirm vastly better performance on my setup (by a factor 20). I can only say fantastic work! 🚀

Question: How does one get the distances?

vprelovac on Aug 6, 2023