tensorflow: Memory leak in eager mode when creating keras model in loop

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch Linux
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: not tested
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): v1.12.1-5259-ge703239 1.15.0-dev20190629
  • Python version: 3.7.3
  • Bazel version (if compiling from source): not compiled from source
  • GCC/Compiler version (if compiling from source): not compiled from source
  • CUDA/cuDNN version: using CPU
  • GPU model and memory: using CPU

Describe the current behavior

In eager execution, when creating a tf.keras.Sequential model inside a loop and discarding it immediately, the memory increases over time. The following code shows this by printing the used memory at each iteration.

import psutil
import tensorflow as tf

tf.compat.v1.enable_eager_execution()

for _ in range(100):
    tf.keras.Sequential([tf.keras.layers.Dense(3000, input_dim=3000)])
    print(psutil.virtual_memory().used / 2 ** 30)

Output:

1.0170440673828125
1.0506706237792969
1.0841865539550781
1.1179122924804688
[...]
4.285423278808594
4.318950653076172
4.35223388671875

The same result happens when using the Functional API or Model subclassing API. Adding tf.keras.backend.clear_session() in the loop solves the leak in all cases like in graph mode. To see this effect better, one should additionally use gc.collect() in the loop.

Describe the expected behavior

While adding tf.keras.backend.clear_session() to the loop helps, this should not be necessary because in eager execution there is no graph to clear, which according to the documentation seems to be the only thing this function does:

Destroys the current TF graph and creates a new one.

Therefore it is also suprising that this function helps at all during eager execution. The expected behavior is that there is no memory leak even without tf.keras.backend.clear_session().

Code to reproduce the issue Code is in description above.

Other info / logs Nothing here.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 16 (5 by maintainers)

Most upvoted comments

A similar situation, if I train the model in the main thread and load the model in another thread AT THE SAME TIME. All things are in a loop. However, if I use clear_session() method in one thread, the code in another thread won’t work!!! I test pytorch and mxnet , and there is no any memory leak in a loop. why??? amazing tensorflow!!! I think that clear_session shouldn’t be necessary. @bionicles @tjume

It seems that clearing model A can influence model B

Oh, I see! Thanks for clarifying the issue. From my understanding (but I might be wrong - I am just a tensorflow user, not an expert, let alone a developer), tf.keras.backend.clear_session clears all graphs that have not been called yet. So, if you declare models A and B, then clear_session will remove all those whose predict (or evaluate, or fit, etc.) method has not yet been called. On the other hand, once you have called such a method, the model cannot be deleted using clear_session, but can be so using del <my_instance>. In either cases, using gc.collect ensures the freed memory is effectively deallocated (otherwise it will be done once the garbage collector’s routine check on it).

Examples:

import numpy as np
import tensorflow as tf

# Enable Eager execution, if required (i.e. using TF 1.x).
if hasattr(tf, 'enable_eager_execution'):
    tf.enable_eager_execution()

# Make some dummy input data.
inputs = np.random.normal(size=(1, 3000))

# Declare two models.
model_a = tf.keras.Sequential([tf.keras.layers.Dense(3000, input_dim=3000)])
model_b = tf.keras.Sequential([tf.keras.layers.Dense(3000, input_dim=3000)])

# This will clear BOTH models (i.e. the underlying graph will be cleared out).
tf.keras.backend.clear_session()
# Both those lines are going to fail:
model_a.predict(inputs)
model_b.predict(inputs)


# Declare two models (again).
model_a = tf.keras.Sequential([tf.keras.layers.Dense(3000, input_dim=3000)])
model_b = tf.keras.Sequential([tf.keras.layers.Dense(3000, input_dim=3000)])

# Use model B.
model_b.predict(inputs)
# This will discard the graph underlying model A ONLY.
tf.keras.backend.clear_session()
# This will fail.
model_a.predict(inputs)
# This will work.
model_b.predict(inputs)

# Discard model B (now that is has been called).
del model_b

# Note that using gc.collect() enforces garbage collection at wanted points.

Maybe I didn’t make it clear. I mean if I train model A in the main thread and load model B and predict in another thread AT THE SAME TIME. All things are in a loop. However, if I use clear_session() method in one thread (to clear model A, e.g), BUT the model B in another thread doesn’t work(DOESN’T predict). As you can see, there is no any relationship between model A and model B. It seems that clearing model A can influence model B. why??? That isn’t logical. @pandrey-fr