SPTAG: When the number ofvectors too big, index build will fail to complete!

When there are too many vectors, such as 5 million (n = 1024×1024×5), index build will fail to complete. The program stops at this place, prompting: “Save Data To xxxxx\vectors.bin” And I noticed that the file size of vectors.bin reach to 300G! that is not normal,the The correct file size should be 10G.

The code that caused the error is as follows:

import SPTAG
import numpy as np
n = 1024*1024*5 #this szie will cause the file size of vectors.bin to reach 300G!
k = 3
r = 3
Dimension=512 #the size of vectors.bin will be 1024×1024×512×4=2G  n×Dimension×4
def testBuild(algo, distmethod, x, out):
   i = SPTAG.AnnIndex(algo, 'Float', x.shape[1])
   i.SetBuildParam("NumberOfThreads", '4')
   i.SetBuildParam("DistCalcMethod", distmethod)
   ret = i.Build(x.tobytes(), x.shape[0])
   i.Save(out)
def Test(algo, distmethod):
   x = np.ones((n, Dimension), dtype=np.float32) * np.reshape(np.arange(n, dtype=np.float32), (n, 1))
   q = np.ones((r, Dimension), dtype=np.float32) * np.reshape(np.arange(r, dtype=np.float32), (r, 1)) * 2
   print ("Build.............................")
   testBuild(algo, distmethod, x, 'testindices')

if __name__ == '__main__':
   Test('BKT', 'L2')

How can I solve this error? Please Help Me!

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 15 (3 by maintainers)

Most upvoted comments

I have tried your python code, and the index build can finish successfully. What is your runtime environment? image