usearch: Bug: Slow index.add

Describe the bug

Index creation, vector addition and search loop is 90x slower than identical loop for faiss.

Reproduced in Google colab.

Steps to reproduce

import faiss
from usearch.index import Index

import numpy as np

# Set the seed for reproducibility
np.random.seed(42)

# Generate 500 random vectors of length 1024 with values between 0 and 1
vectors = np.random.rand(500, 1024)
vector=np.random.rand(1024)

%%timeit
# FAISS

indexf = faiss.IndexFlatL2(vectors.shape[-1])
indexf.add(np.array(vectors).astype(np.float32))
D, I = indexf.search(np.array([vector]), 50)
# 1.14 ms ± 4.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


%%timeit

index = Index(
    ndim=vectors.shape[-1], # Define the number of dimensions in input vectors
    metric='l2_sq', # Choose 'cos', 'l2sq', 'haversine' or other metric, default = 'ip'
  
   )
index.add(labels= np.arange(len(vectors)), vectors=vectors)
matches, distances, count = index.search(vector, 50, exact=True)
  
# 94.7 ms ± 20.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Expected behavior

n/a

USearch version

latest pip

Operating System

ubuntu

Hardware architecture

x86

Which interface are you using?

Python bindings

Contact Details

No response

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for looking into it.

  1. 512-1024
  2. l2sq
  3. Python

This is the typical retrieval scenario in web search when LLMs are used.

I am passing exact=True for search as you see in my test code.

I can indeed confirm vastly better performance on my setup (by a factor 20). I can only say fantastic work! 🚀

Question: How does one get the distances?

Screenshot 2023-08-05 at 17 16 24