ipython: Memory leak even when cache_size = 0 and history_length = 0 or history_length = 1

i’m using git master ipython, python 2.7.5, arch linux 64-bit

using the following code as a starting point

from pandas import DataFrame, concat
from numpy.random import randint, randn
from string import ascii_letters as letters
ncols = 16
nrows = 1e7
letters = list(letters)
df = DataFrame(randn(nrows, ncols), columns=letters[:ncols])
col = DataFrame(randint(1, size=nrows), columns=[letters[ncols]])
big = concat([df, col], axis=1)

if you now repeatedly evaluate big.values (by hand not in a loop) then memory keeps on growing (open htop or top to watch it in action). same is true when i’ve set the output cache to zero and the history length to 0 or 1.

furthermore this doesn’t happen in vanilla python which has just _ for history and if i e.g., evaluate big.values n times then i need to execute n other statements e.g,. x = 1 n times to reclaim the memory.

see issue at pydata/pandas#3629 for a long discussion about this. am i missing some feature/quirk of the history or caching system?

About this issue

  • Original URL
  • State: closed
  • Created 11 years ago
  • Reactions: 1
  • Comments: 18 (10 by maintainers)

Most upvoted comments

@mhsekhavat: that setting is a kernel or standalone interactive console option and does not apply to to notebooks, so it’s not surprising that jupyter_notebook_config.py does not affect it. You’ll want to put that in a .py file inside the ~/.ipython/profile_default/startup/ folder (there’s a README there to help you).

For @wjakob:

%config TerminalInteractiveShell.cache_size = 0

That specific setting works only when using ipython in a terminal. The one you want to use in the notebook is the one that that applies to kernels -

%config ZMQInteractiveShell.cache_size = 0

I’ve just stumbled across this issue. Even with

%config TerminalInteractiveShell.cache_size = 0

Jupyter notebook will not free objects referenced in cells. For instance, when writing

A + B

where (A and B could be NumPy arrays or tensors on a GPU), Jupyter will hold on to A+B for the rest of the session even though the object should be freed right after printing/visualizing it.

This is not a problem when working with small objects, but it does get rather annoying in interactive sessions that involve big data on a resource-constrained device (GPU memory…)

Ah, the last three outputs are cached as _, __, ___, even if you set the output cache size to zero. I think that’s intentional, but I agree that it’s confusing. Maybe we should disable those as well if the output cache is disabled, or just fall back to the standard Python _.