tensorflow: Memory leak when repeatedly loading and deleting keras models

If a Keras model is saved using tf.saved_model.save and then repeatedly loaded with tf.saved_model.load and deleted with del it becomes apparent that there is a slow memory leak. keras.backend.clear_session does not resolve this issue. See attached gist for an example that reproduces this issue in TensorFlow 2.2 on Google Colab.

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): I have attached a custom repro case, but this appears to happen for various types of typical keras models.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Can reproduce in Google Colab and Docker RedHat images
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: not tested
TensorFlow installed from (source or binary): binary (from pip)
TensorFlow version (use command below): (‘2.2.0’, ‘v2.2.0-0-g2b96f3662b’)
Python version: 3.6.9 (google colab)
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): N/A
CUDA/cuDNN version: default in google colab
GPU model and memory: default in google colab

Describe the current behavior When Keras models are saved / loaded repeatedly, memory usage gradually continues to grow over time. For dynamic model servers that load and unload models over time, this may eventually lead to a crash due to memory exhaustion.

Describe the expected behavior All memory should be recovered after a keras model instance is deleted with del and the garbage collector is run with gc.collect().

Standalone code to reproduce the issue The following GitHub gist demonstrates the issue (can also be run in Colab): https://gist.github.com/idfah/dff83de8d2a6406c9b92221e6282a8d6

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 21 (17 by maintainers)

Most upvoted comments

Wow when reading the third leak, I thought it would be hilarious if there is a fourth (though I think there would be a fifth), by any chance, have you documented the thinking and the tools used to track down these leaks?

My main piece of advice in terms of tools is that the tools lie. TensorFlow is a very complex program with multiple heaps (Python heap, C heap, TensorFlow’s own memory manager, and CUDA) and global data structures that can look like leaks at first glance. You need to use several leak checkers and cross-reference their outputs. For this particular issue, I used tcmalloc, pprof, Python’s gc package, the PyCharm debugger, and some additional Python code to walk the heap from inside the program under test.

frreiss on Jun 10, 2021

@frreiss the commit you referred to has been cherrypicked into 2.2.2 as well. I am closing this bug for now if any other issues linger can you please open a new one? Thanks!

goldiegadde on Jan 8, 2021

@jvishnuvardhan, I think this issue needs to be kept open a while longer.

So far, we have verified that there is a serious memory leak in tf.saved_model.load() in TensorFlow 2.0.x, 2.1.x, and 2.2.x.

I would categorize this leak as a blocker for any application that needs to cycle models in and out of memory – for example, to process a corpus of documents that span multiple languages; or to serve multiple versions of the same model. Our simple test program “only” leaks a few megabytes each time the model is loaded, but larger models with weights embedded in their graphs can leak hundreds of megabytes per load/unload cycle.

The leak is actually three leaks, all of which were patched in master back in March (in commit 3421416220f5dd65340f03332ff1d474de69c052). However, the fix was not included in the May release of TensorFlow 2.2.0. As of today, five months later, the fix has not been ported to the 2.2.x, 2.1.x, or 2.0.x branches of TensorFlow.

TensorFlow 2.3.0 includes the fix for these three memory leaks. However, fixing this bug in 2.3.0 does not resolve this issue for us. My colleagues are currently using TensorFlow 2.2.x.

In addition to the three large leaks, there is a fourth leak that is not currently patched in the master branch. You can see the presence of this fourth leak in the output of the notebook linked in your previous comment: memory_leak

With the simple 2-layer model in your example notebook, each call to saved_model.load() leaks about 25kb of memory. Larger models leak more, probably a megabyte or two for a medium-sized deep learning model. This level of memory leakage is something that one could plausibly work around with periodic reboots; but I would submit that tf.saved_model.load() ought not to leak any memory at all. Authors of long-running applications should be able to load and unload TensorFlow models without worrying about running out of memory.

I have tracked the root cause of the fourth leak to a problem in TensorFlow’s mechanism for caching kernels.

In addition to creating graphs, tf.saved_model.load() executes operations in those graphs, primarily for the purpose of restoring variable values. The code that executes these operations is EagerExecute(), which calls EagerLocalExecute(), which calls GetOrCreateKernelAndDevice(), which asks the current EagerContext to look for the kernel for each operation in its kernel cache.

The EagerContext class maintains a cache of kernel instances: (in tensorflow/core/common_runtime/eager/context.h, direct link here)

612  std::unordered_map<Fprint128, core::RefCountPtr<KernelAndDevice>,
613                     Fprint128Hasher>
614      kernel_cache_ TF_GUARDED_BY(cache_mu_);

The kernel cache does not have a size limit. There is an API call to clear the cache, but the Python side of TensorFlow only uses that API call when resetting the global random seed.

Each entry in the cache is parameterized by the “fingerprint” of the associated operation. This “fingerprint” is a hash value computed from multiple parameters of the operation, including all of the operation’s attributes.

saved_model.load() restores variables through a process that involves invoking multiple instances of the VarHandleOp operation. Due to the graph rewriting that happens during model loading, each of these operations has a unique value in the shared_name field if its attributes. These unique values cause the operations to have unique fingerprints, even across multiple load operations on the same model. Each unique fingerprint causes the creation of a new entry in the cache. The cache is of unlimited size and is never cleared, so memory usage for these cached kernels grows in an unbounded fashion.

The best workaround I’ve found for this problem is to have your Python application periodically clear the cache via the internal API. Here’s some Python code to do so:

import tensorflow.python as tf_internals
context_handle = tf_internals.eager.context.context()._context_handle
if context_handle is not None:
  tf_internals.pywrap_tfe.TFE_ContextClearCaches(context_handle)

A more permanent fix would be to evict stale entries from the cache following a least recently used policy. I’m working on a PR to apply such a fix.

The above workaround reduces but does not eliminate the memory leakage of my test program. Before the workaround, each call to saved_model.load() leaked about 125kb on TensorFlow 2.3.0 and 115kb on tf-nightly. After the workaround, each call leaks about 30kb and 20kb on TensorFlow 2.3.0 and nightly, respectively. I haven’t checked whether this remaining leakage is constant or whether it scales with model size. However, since the leakage appears to be coming from a graph rewrite, I would expect the amount of memory leaked to scale with model size.

frreiss on Aug 11, 2020