astropy: Memory leak in Table indices
Description
Repeatedly accessing an indexed Table
causes memory use to grow in an unexpected and undesired way. In a real-world application on a large table this was causing memory use to exceed 18 Gb. Removing the table index and repeating the access code kept memory use below 1 Gb. We used memray
to see memory climbing continuously during a loop which repeatedly accessed elements of an indexed table.
Expected behavior
Memory use should remain approximately constant after the first access.
How to Reproduce
The following should reproduce the problem. You can use a package like memray
to monitor memory or just watch a process monitor for memory use of the Python process. For me this starts with about 180 Mb of memory after the first table t
is created. After running this memory use is around 1 Gb, while I would expect something under 400 Mb.
import numpy as np
from astropy.table import MaskedColumn
from astropy.table.table_helpers import simple_table
from astropy.time import Time
from tqdm import tqdm
size = 250000
t = simple_table(size=size, cols=26)
idxs = Time(np.random.randint(0, size // 20, size=size), format="cxcsec").isot
t["idx"] = MaskedColumn(idxs) # THIS IS THE PROBLEM
t.add_index(["idx"])
idxs = np.random.choice(t["idx"], size=100, replace=False)
for idx in tqdm(idxs):
star_obs = t[t["idx"] == idx]
Versions
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import astropy; print("astropy", astropy.__version__)
import numpy; print("Numpy", numpy.__version__)
import erfa; print("pyerfa", erfa.__version__)
try:
import scipy
print("Scipy", scipy.__version__)
except ImportError:
print("Scipy not installed")
try:
import matplotlib
print("Matplotlib", matplotlib.__version__)
except ImportError:
print("Matplotlib not installed")
macOS-14.2.1-x86_64-i386-64bit
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:27:35) [Clang 14.0.6 ]
astropy 5.3.1
Numpy 1.23.5
pyerfa 2.0.0.1
Scipy 1.10.0
Matplotlib 3.6.3
About this issue
- Original URL
- State: open
- Created 4 months ago
- Comments: 24 (22 by maintainers)
👋 Hey,
memray
author and Python core dev here. I run your example with native symbols and debug info and this is what I get:So looks like most of the memory (2.1GB) is allocated in
astropy/table/column.py:529
. You can easily do the following to get these using docker:If you want to give the refcycle theory a go you can use https://docs.python.org/3/library/gc.html#gc.set_debug with https://docs.python.org/3/library/gc.html#gc.DEBUG_LEAK to confirm what are cycles.
@neutrinoceros - assigning a
MaskedArray
to a table gives aMaskedColumn
- so no joy there…Ohhhh, this produces a beautiful leak.
It’s been seen in the wild on both Linux and Mac.
@neutrinoceros - one thing I just thought about … you might try using the
SCEngine
sorting engine (instead of the default sorted array) and see if the leak persists. That might help localize the problem. See : https://docs.astropy.org/en/stable/table/indexing.html#enginesI can reproduce this (on macOS too), so that’s a start. First, I’ve attempted to run garbage collection every 10th iterations of the loop (just because it’s easy to test): no change, so it seems safe to conclude that the problem isn’t trivial (maybe some unreachable reference cycles are generated, so garbage-collection isn’t completely out of the picture).
Memray indeed helps visualise a slow but steady growth in resident memory, while the heap size stays at bay. I found that with memray’s default behaviour, you don’t get much more details (most of the allocations are not traced). Enabling the
--native
flag allows to trace allocations from C/C++ extensions, which seems to be what we want here, however, there are a number of known limitations for this on macOS:quoting https://bloomberg.github.io/memray/native_mode.html
While trying to use it, I was also unfortunate enough to stumble upon what now looks like a CPython bug (reported and discussed at https://github.com/bloomberg/memray/issues/553). Switching to Python 3.12.2 resolved the problem so I’m now able to get a first view of native allocations. Here’s the script I’m using (with
t.py
containing the MWE from @taldcroft)This took me long enough to figure out, which is why I’m reporting at an early stage. I will now try to actually inspect the profile and see if it contains enough information to find the bug (or get a sense of where to look more closely).
If it doesn’t suffice, running with CPython + numpy + astropy all compiled with debug symbols would be necessary; however I know that for numpy this is significantly simpler on Linux (macOS is supposed to be supported too but I never could get anything from it). I was planning to set up a Linux VM at some point, this may be the excuse I’ve been waiting for. Although, before I do that, I want to ask @MridulS if he happens to be in a better starting position to try this.