tensorflow: Memory leak

I have a memory leak with TensorFlow. I refered to https://stackoverflow.com/questions/35695183/tensorflow-memory-leak-even-while-closing-session to address my issue, and I followed the advices of the answer, that seemed to have solved the problem. However it does not work here.

In order to recreate the memory leak, I have created a simple example. First, I use this function (that I got here : https://stackoverflow.com/questions/276052/how-to-get-current-cpu-and-ram-usage-in-python) to check the memory use of the python process :

def memory():
    import os
    import psutil
    pid = os.getpid()
    py = psutil.Process(pid)
    memoryUse = py.memory_info()[0]/2.**30  # memory use in GB...I think
    print('memory use:', memoryUse)

Then, everytime I call the build_model function, the use of memory increases.

Here is the build_model function that has a memory leak :

def build_model():

    '''Model'''

    tf.reset_default_graph()


    with tf.Graph().as_default(), tf.Session() as sess:
        tf.contrib.keras.backend.set_session(sess)

        labels = tf.placeholder(tf.float32, shape=(None, 1))
        input = tf.placeholder(tf.float32, shape=(None, 1))

        x = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense1')(input)
        x1 = tf.contrib.keras.layers.Dropout(0.5)(x)
        x2 = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense2')(x1)
        y = tf.contrib.keras.layers.Dense(1, activation='sigmoid', name='dense3')(x2)


        loss = tf.reduce_mean(tf.contrib.keras.losses.binary_crossentropy(labels, y))

        train_step = tf.train.AdamOptimizer(0.004).minimize(loss)

        #Initialize all variables
        init_op = tf.global_variables_initializer()
        sess.run(init_op)

        sess.close()

    tf.reset_default_graph()

    return

I would have thought that using the block with tf.Graph().as_default(), tf.Session() as sess: and then closing the session and calling tf.reset_default_graph would clear all the memory used by TensorFlow. Apparently it does not.

The memory leak can be recreated as following :

memory()
build_model()
memory()
build_model()
memory()

The output of this is (for my computer) :

memory use: 0.1794891357421875
memory use: 0.184417724609375
memory use: 0.18923568725585938

Clearly we can see that all the memory used by TensorFlow is not freed afterwards. Why?

I hope I made myself clear.

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 4
Comments: 31 (10 by maintainers)

Commits related to this issue

Wrap predictor into multiprocessing to prevent memory leak This is a known issue of tensorflow when using graph. See https://github.com/tensorflow/tensorflow/issues/10408 — committed to bonheml/TMB by deleted user 5 years ago

Most upvoted comments

Early tests seems to show that _GRAPH_LEARNING_PHASES is not cleared so there is tons of tensors that get kept.

Changing to def reset_uids(): global _GRAPH_UID_DICTS global _GRAPH_LEARNING_PHASES _GRAPH_UID_DICTS = {} _GRAPH_LEARNING_PHASES = {}

Seems to resolve the problem.

memory use: 0.13166046142578125 memory use: 0.13190841674804688 memory use: 0.13220977783203125 memory use: 0.13220977783203125 memory use: 0.13220977783203125 memory use: 0.13220977783203125 memory use: 0.13220977783203125 memory use: 0.13220977783203125

Will do a PR on keras, it will then get merged into TF I guess?

Dref360 on Jun 5, 2017

I recently resolved my model’s “memory leak” issue, it turns out to be a counter-paradigm of how to construct and run the model(i.e various TensorOps)… I mistakenly added cosine_decay_restarts ops in every iteration of the training, something like:

while ... :
    ....
    dlr = tf.train.cosine_decay_restarts(
            learning_rate=LEARNING_RATE,
            global_step=cur_step,
            first_decay_steps=LR_DECAY_STEPS,
            t_mul=1.0,
            m_mul=1.0,
            alpha=LEARNING_RATE_ALPHA
        )
    lr = sess.run([dlr], feed_dict={cur_step: bno-DECAYED_LR_START})[0]
    ...
    #train the model using the decayed learning rate

And the training python process would get killed by the OOM killer at some point. My instincts told me I shouldn’t re-construct the train.cosine_decay_restarts in every loop, so after this simple remedy, the “memory leak” issue was gone…

dlr = tf.train.cosine_decay_restarts(
    learning_rate=LEARNING_RATE,
    global_step=cur_step,
    first_decay_steps=LR_DECAY_STEPS,
    t_mul=1.0,
    m_mul=1.0,
    alpha=LEARNING_RATE_ALPHA
)
...
while ... :
    ...
    lr = sess.run([dlr], feed_dict={cur_step: bno-DECAYED_LR_START})[0]
    ...
    #train the model using the decayed learning rate

So maybe everybody could make a coarse check of whether you are re-constructing the model in every loop…

carusyte on Jul 23, 2018

I am getting the same issue with multiple models prediction, either in sequential or in parallel execution. The memory doesn’t seem to free the memory after use.

ashokpant on Jun 3, 2018

@fchollet Perhaps, everyone is in trouble with this bug. Do not forget.

ghost on May 25, 2018

I have a very similar issue causing memory leak, but I’m only using tensorflow without keras. Here’s the minimal code:

import tensorflow as tf import numpy as np for i in range(30): tf.Session().enter() tf.constant(np.random.random((800,500,500,1))) tf.get_default_session().close() tf.reset_default_graph()

When executing the loop the memory used keeps going up. How can I actually delete the old large constants and free the memory? I’m using tensorflow 1.2 with python 3.4 on ubuntu 14.04

myyan92 on Oct 23, 2017

@Dref360 this is fixed in TF master: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/keras/python/keras/backend.py#L288

Does this resolve the problem?

fchollet on Jun 5, 2017

@jart Here you go. As you can see, the memory usage goes up in a linear way, which is exactly the problem.

About the number of APIs being called, do you refer to Keras by saying that ? I only use tf.contrib.keras, which is part of tensorflow. Hence I only use tensorflow here.

Caselles on Jun 3, 2017