ipython: Memory leak even when cache_size = 0 and history_length = 0 or history_length = 1
i’m using git master ipython, python 2.7.5, arch linux 64-bit
using the following code as a starting point
from pandas import DataFrame, concat
from numpy.random import randint, randn
from string import ascii_letters as letters
ncols = 16
nrows = 1e7
letters = list(letters)
df = DataFrame(randn(nrows, ncols), columns=letters[:ncols])
col = DataFrame(randint(1, size=nrows), columns=[letters[ncols]])
big = concat([df, col], axis=1)
if you now repeatedly evaluate big.values
(by hand not in a loop) then memory keeps on growing (open htop
or top
to watch it in action). same is true when i’ve set the output cache to zero and the history length to 0 or 1.
furthermore this doesn’t happen in vanilla python which has just _
for history and if i e.g., evaluate big.values
n times then i need to execute n other statements e.g,. x = 1
n times to reclaim the memory.
see issue at pydata/pandas#3629 for a long discussion about this. am i missing some feature/quirk of the history or caching system?
About this issue
- Original URL
- State: closed
- Created 11 years ago
- Reactions: 1
- Comments: 18 (10 by maintainers)
@mhsekhavat: that setting is a kernel or standalone interactive console option and does not apply to to notebooks, so it’s not surprising that
jupyter_notebook_config.py
does not affect it. You’ll want to put that in a.py
file inside the~/.ipython/profile_default/startup/
folder (there’s a README there to help you).For @wjakob:
That specific setting works only when using
ipython
in a terminal. The one you want to use in the notebook is the one that that applies to kernels -I’ve just stumbled across this issue. Even with
Jupyter notebook will not free objects referenced in cells. For instance, when writing
where (
A
andB
could be NumPy arrays or tensors on a GPU), Jupyter will hold on toA+B
for the rest of the session even though the object should be freed right after printing/visualizing it.This is not a problem when working with small objects, but it does get rather annoying in interactive sessions that involve big data on a resource-constrained device (GPU memory…)
Ah, the last three outputs are cached as
_, __, ___
, even if you set the output cache size to zero. I think that’s intentional, but I agree that it’s confusing. Maybe we should disable those as well if the output cache is disabled, or just fall back to the standard Python_
.