ipython: Memory leak with %matplotlib inline

Hey everyone

I’ve found a problem. Just launch the code and look at the memory. Then delete “%matplotlib inline” and launch again.

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150

OUTPUT_FILENAME = "Asd"

def printHTML(html):
    with open(OUTPUT_FILENAME, "a") as outputFile: outputFile.write(html if type(html) == str else html.encode('utf8') )

def friendlyPlot():

    figure = plt.Figure()
    ax = plt.subplot2grid((1,2), (0,0))

    ax.plot( range(1000), range(1000) )


    #plt.show() 
    fig = plt.gcf()

    imgdata = StringIO.StringIO()
    fig.savefig(imgdata, format='png')
    imgdata.seek(0)  # rewind the data
    image = imgdata.buf.encode('base64').replace('\n', '')
    printHTML('<img src="data:image/png;base64,{0}" /><br />'.format(image))
    plt.close('all')
    imgdata.close()

open(OUTPUT_FILENAME, 'w').close()

for i in range(500):
    friendlyPlot()

About this issue

  • Original URL
  • State: open
  • Created 10 years ago
  • Reactions: 4
  • Comments: 23 (12 by maintainers)

Most upvoted comments

I’ll second that a fix on this issue would be appreciated.

Hi, I believe I have found part of the culprit and a way to significantly, but not completely, reduce this problem!

After scrolling through the ipykernel/pylab/backend_inline.py code, I got the hunch that interactive mode does a lot of storing of “plot-things”, though I don’t understand it completely, so I am not able to pinpoint the exact reason with certainty.

Here is the code to verify this (based on @tacaswell’s snippet above), useful for anyone trying to implement a fix.

Initialization:

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

matplotlib.rcParams['figure.figsize'] = (24, 6)
matplotlib.rcParams['figure.dpi'] = 150

from resource import getrusage
from resource import RUSAGE_SELF

def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')

Actual test:

print("before any:  {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
friendlyPlot()
print("before loop: {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
for i in range(50):
    friendlyPlot()
print("after loop:  {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
import gc ; gc.collect(2)
print("after gc:    {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))

Running it for 50 iterations of the loop, I get:

before any:    87708 kB
before loop:  106772 kB
after loop:   786668 kB
after gc:     786668 kB

Running it for 200 iterations of the loop, I get:

before any:    87708 kB
before loop:  100492 kB
after loop:  2824316 kB
after gc:    2824540 kB

which shows the almost linear increase in memory with iterations.

Now to the fix/workaround: call matplotlib.interactive(False) before the test-snippet, and then run it.

With 50 iterations:

before any:    87048 kB
before loop:  104992 kB
after loop:   241604 kB
after gc:     241604 kB

And with 200 iterations:

before any:    87536 kB
before loop:  103104 kB
after loop:   239276 kB
after gc:     239276 kB

Which confirms that only a constant increase (independent of iterations) is left.

Using these numbers, I make a rough estimate of the leak size per iteration:

(786668-(241604 - 104992))/50   = 13001.12
(2824316-(241604 - 104992))/200 = 13438.52

And for a single iteration of the loop, I get 13560. So the amount of leak per iteration is significantly smaller than the image size, be it raw (>3MB) or png-compressed (54KB).

Also, strangely, running a small-scale test (only few iterations) repeatedly in the same cell without restarting the kernel is much less consistent, I have not been able to understand this or determine a pattern.

I hope someone with more knowledge of the internals can take it from here, as I lack the time and knowledge to dive deeper into it right now.

BTW, I’m still hitting this issue from time to time on latest matplotlib, pandas, jupyter, ipython. If anyone knows any debugger that can help to troubleshoot this multi-process communication, then please let me know.