tensorflow: Memory leak in eager mode when creating keras model in loop
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch Linux
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: not tested
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): v1.12.1-5259-ge703239 1.15.0-dev20190629
- Python version: 3.7.3
- Bazel version (if compiling from source): not compiled from source
- GCC/Compiler version (if compiling from source): not compiled from source
- CUDA/cuDNN version: using CPU
- GPU model and memory: using CPU
Describe the current behavior
In eager execution, when creating a tf.keras.Sequential model inside a loop and discarding it immediately, the memory increases over time. The following code shows this by printing the used memory at each iteration.
import psutil
import tensorflow as tf
tf.compat.v1.enable_eager_execution()
for _ in range(100):
tf.keras.Sequential([tf.keras.layers.Dense(3000, input_dim=3000)])
print(psutil.virtual_memory().used / 2 ** 30)
Output:
1.0170440673828125
1.0506706237792969
1.0841865539550781
1.1179122924804688
[...]
4.285423278808594
4.318950653076172
4.35223388671875
The same result happens when using the Functional API or Model subclassing API. Adding tf.keras.backend.clear_session() in the loop solves the leak in all cases like in graph mode. To see this effect better, one should additionally use gc.collect() in the loop.
Describe the expected behavior
While adding tf.keras.backend.clear_session() to the loop helps, this should not be necessary because in eager execution there is no graph to clear, which according to the documentation seems to be the only thing this function does:
Destroys the current TF graph and creates a new one.
Therefore it is also suprising that this function helps at all during eager execution. The expected behavior is that there is no memory leak even without tf.keras.backend.clear_session().
Code to reproduce the issue Code is in description above.
Other info / logs Nothing here.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 16 (5 by maintainers)
A similar situation, if I train the model in the main thread and load the model in another thread AT THE SAME TIME. All things are in a loop. However, if I use clear_session() method in one thread, the code in another thread won’t work!!! I test pytorch and mxnet , and there is no any memory leak in a loop. why??? amazing tensorflow!!! I think that clear_session shouldn’t be necessary. @bionicles @tjume
Oh, I see! Thanks for clarifying the issue. From my understanding (but I might be wrong - I am just a tensorflow user, not an expert, let alone a developer),
tf.keras.backend.clear_sessionclears all graphs that have not been called yet. So, if you declare models A and B, thenclear_sessionwill remove all those whosepredict(orevaluate, orfit, etc.) method has not yet been called. On the other hand, once you have called such a method, the model cannot be deleted usingclear_session, but can be so usingdel <my_instance>. In either cases, usinggc.collectensures the freed memory is effectively deallocated (otherwise it will be done once the garbage collector’s routine check on it).Examples:
Maybe I didn’t make it clear. I mean if I train model A in the main thread and load model B and predict in another thread AT THE SAME TIME. All things are in a loop. However, if I use clear_session() method in one thread (to clear model A, e.g), BUT the model B in another thread doesn’t work(DOESN’T predict). As you can see, there is no any relationship between model A and model B. It seems that clearing model A can influence model B. why??? That isn’t logical. @pandrey-fr