tensorflow: Memory leak
I have a memory leak with TensorFlow. I refered to https://stackoverflow.com/questions/35695183/tensorflow-memory-leak-even-while-closing-session to address my issue, and I followed the advices of the answer, that seemed to have solved the problem. However it does not work here.
In order to recreate the memory leak, I have created a simple example. First, I use this function (that I got here : https://stackoverflow.com/questions/276052/how-to-get-current-cpu-and-ram-usage-in-python) to check the memory use of the python process :
def memory():
import os
import psutil
pid = os.getpid()
py = psutil.Process(pid)
memoryUse = py.memory_info()[0]/2.**30 # memory use in GB...I think
print('memory use:', memoryUse)
Then, everytime I call the build_model function, the use of memory increases.
Here is the build_model function that has a memory leak :
def build_model():
'''Model'''
tf.reset_default_graph()
with tf.Graph().as_default(), tf.Session() as sess:
tf.contrib.keras.backend.set_session(sess)
labels = tf.placeholder(tf.float32, shape=(None, 1))
input = tf.placeholder(tf.float32, shape=(None, 1))
x = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense1')(input)
x1 = tf.contrib.keras.layers.Dropout(0.5)(x)
x2 = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense2')(x1)
y = tf.contrib.keras.layers.Dense(1, activation='sigmoid', name='dense3')(x2)
loss = tf.reduce_mean(tf.contrib.keras.losses.binary_crossentropy(labels, y))
train_step = tf.train.AdamOptimizer(0.004).minimize(loss)
#Initialize all variables
init_op = tf.global_variables_initializer()
sess.run(init_op)
sess.close()
tf.reset_default_graph()
return
I would have thought that using the block with tf.Graph().as_default(), tf.Session() as sess: and then closing the session and calling tf.reset_default_graph would clear all the memory used by TensorFlow. Apparently it does not.
The memory leak can be recreated as following :
memory()
build_model()
memory()
build_model()
memory()
The output of this is (for my computer) :
memory use: 0.1794891357421875
memory use: 0.184417724609375
memory use: 0.18923568725585938
Clearly we can see that all the memory used by TensorFlow is not freed afterwards. Why?
I hope I made myself clear.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 4
- Comments: 31 (10 by maintainers)
Early tests seems to show that _GRAPH_LEARNING_PHASES is not cleared so there is tons of tensors that get kept.
Changing to
def reset_uids(): global _GRAPH_UID_DICTS global _GRAPH_LEARNING_PHASES _GRAPH_UID_DICTS = {} _GRAPH_LEARNING_PHASES = {}Seems to resolve the problem.
Will do a PR on keras, it will then get merged into TF I guess?
I recently resolved my model’s “memory leak” issue, it turns out to be a counter-paradigm of how to construct and run the model(i.e various TensorOps)… I mistakenly added cosine_decay_restarts ops in every iteration of the training, something like:
And the training python process would get killed by the OOM killer at some point. My instincts told me I shouldn’t re-construct the
train.cosine_decay_restartsin every loop, so after this simple remedy, the “memory leak” issue was gone…So maybe everybody could make a coarse check of whether you are re-constructing the model in every loop…
I am getting the same issue with multiple models prediction, either in sequential or in parallel execution. The memory doesn’t seem to free the memory after use.
@fchollet Perhaps, everyone is in trouble with this bug. Do not forget.
I have a very similar issue causing memory leak, but I’m only using tensorflow without keras. Here’s the minimal code:
import tensorflow as tf import numpy as np for i in range(30): tf.Session().enter() tf.constant(np.random.random((800,500,500,1))) tf.get_default_session().close() tf.reset_default_graph()
When executing the loop the memory used keeps going up. How can I actually delete the old large constants and free the memory? I’m using tensorflow 1.2 with python 3.4 on ubuntu 14.04
@Dref360 this is fixed in TF master: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/keras/python/keras/backend.py#L288
Does this resolve the problem?
About the number of APIs being called, do you refer to Keras by saying that ? I only use tf.contrib.keras, which is part of tensorflow. Hence I only use tensorflow here.