annoy: Memory Leakage

Hi there,

annoy (great library!) is our goto solution for NN search. I’m creating multiple annoy indices, add multiple samples to each, save each, and unload each:

import memory
from annoy import AnnoyIndex
import numpy as np

dim = 49152
nsamples = 209

def vectors_to_add():
    v = nsamples * [None]
    for i in range(len(v)):
        v[i] = np.zeros(dim)
    return v

def build_index(count):
    knn = AnnoyIndex(dim, 'euclidean')
    v = vectors_to_add()
    for i,vector in enumerate(v):
        knn.add_item(i,vector)
    knn.save('knn-%d.aix' %count)
    knn.unload()
    del knn # to make sure (?)

for count in range(100):
    build_index(count)
    print 'Added index no.', count, ': total memory -> ', memory.memory()/(1024**3), 'GB'

When running on a (virtualized) Debian 9, I experience massive memory leakage: Added index no. 0 : total memory -> 0.459384918213 GB Added index no. 1 : total memory -> 0.769920349121 GB Added index no. 2 : total memory -> 1.003074646 GB Added index no. 3 : total memory -> 1.2364730835 GB Added index no. 4 : total memory -> 1.46962738037 GB …

When running on an Ubuntu 16.04, things are fine: Added index no. 0 : total memory -> 0.232261657715 GB Added index no. 1 : total memory -> 0.231666564941 GB Added index no. 2 : total memory -> 0.231666564941 GB Added index no. 3 : total memory -> 0.231666564941 GB Added index no. 4 : total memory -> 0.231666564941 GB Added index no. 5 : total memory -> 0.231666564941 GB …

Remarks:

  • I’m not even running build() on the index structure, just adding samples.
  • I looked at the C-code in annoylib.h and cannot find any leak.
  • I wonder if memory mapping (mmap/munmap) is the issue (?) or anything with the C-Python-interface (?)
  • The memory measurement (memory.py) is self-implemented and drawn from here (reads from /proc/pid/status, should be OK, I can share if it helps).
  • I tried with annoy installed from pip (1.9.3) and from source (same issue).

My questions:

  1. Is there any hint to what’s going on? (I’m aware this may be rather a platform issue than an issue with annoy itself. Still, any pointers are greatly appreciated 😉
  2. What system libraries can I check/update?
  3. Should I check if MAP_POPULATE (annoylib.h) is set? (and if yes, how do I do that?)

Kind regards (and thank you so much!), Adrian

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 15

Most upvoted comments

thanks for all the help identifying the bug and nailing down the problematic version!