keras: Repeatedly calling model.predict(...) results in memory leak

System information

Have I written custom code (as opposed to using example directory): Yes, minimal example attached
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.14.5
TensorFlow backend (yes / no): yes
TensorFlow version: 1.12.0
Keras version: 2.2.4
Python version: 3.6.8
CUDA/cuDNN version: N/A
GPU model and memory: N/A

Describe the current behavior
Loading a model once and then repeatedly calling model.predict(...) results in continually increasing memory usage

Describe the expected behavior
Calling model.predict(...) should not result in any permanent increase in memory usage

Code to reproduce the issue

import keras
import numpy as np

model = keras.applications.mobilenet_v2.MobileNetV2()
X = np.random.rand(1, 224, 224, 3)

while True:
    # Leaks:
    y = model.predict(X)[0]
    
    # Does not leak:
    # y = [0]

Other info / logs
I am running on a Mac Mini and using CPU only. When I run the code above, memory usage climbs steadily and will eventually consume 20+ GB of memory!

The issue does not appear to exist on my Ubuntu 18.04 laptop using a GTX 1050

I’m open to the idea that I may be doing something wrong, but with such a small example it seems hard to believe.

Let me know what other info I can give to help.

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 37
Comments: 92 (6 by maintainers)

Commits related to this issue

submission_df() | BUGFIX: Repeatedly calling model.predict(...) results in memory leak - https://github.com/keras-team/keras/issues/13118 — committed to JamesMcGuigan/kaggle-bengali-ai by JamesMcGuigan 4 years ago
Reduce memory leaked by using model() instead of model.predict() In a long-running process invoking `nsfw_detector.predict.classify_nd()` often, we observed memory leakage consistent with what [keras... — committed to colindean/nsfw_model_macos by colindean a year ago

Most upvoted comments

One way to avoid this. Try

model(feature_list)

instead of using model.predict(feature_list).

And do not be happy so quickly, there is also a slow leakage in model.fit() function. Geez, did keras team really do any test before releasing it?

+45

xiahualiu on Nov 7, 2019

Same issue. My memory usage balloons while calling model.predict() in a loop. Even with tf.keras.backend.clear_session() and gc.collect() at the end of the loop.

I really would like a fix to this.

+41

brandonbell11 on Sep 11, 2019

@JivanRoquet Hello! You are right, the neural network in tensorflow feeds on (batched) tensors rather than ndarrays. I use PyTorch now, it worked perfect for my reinforcement learning project, no memory leakage at all. I suggest anyone who can select their platform go for PyTorch. It is stable, fast and easy to use.

+37

xiahualiu on Apr 27, 2020

I have the same issue. model.predict() causes a big memory leak. I am not sure how posting here helps - There is absolutely no response from the TF team, going by the numerous threads on this topic.

Isn’t model.predict() the core of this whole model building business. Not sure why this issue is persisting from TF2.0 and still goes on today without any real response or post from the TF team.

Sucks that so many people are wasting their time on this

+27

vijay-yajiv on Aug 9, 2020

model.predict_on_batch fix for me.

+13

FirminSun on Jan 17, 2020

In a related TensorFlow issue (tensorflow/tensorflow#33009), someone noted that predict_on_batch doesn’t display the same issue. That workaround was effective for me, but not for another person in that thread.

I tested predict_on_batch and it made a big difference compared to predict method, but still it has memory leak.

mehdimo on Jan 27, 2020

In a related TensorFlow issue (https://github.com/tensorflow/tensorflow/issues/33009), someone noted that predict_on_batch doesn’t display the same issue. That workaround was effective for me, but not for another person in that thread.

novog on Oct 14, 2019

This workaround does not work for me. I have to train a rolling window in an online fashion so clear_session is not available to me and I cannot separate the memory for different iterations. I have been increasing memory on my compute node just to get some results but this leak is just too big for large datasets. I’m using Tensorflow v2.5 on a CPU-only machine. @fchollet this is a bug that is not solved and has no catch-all workaround, please consider reopening.

sebasv on Jun 29, 2021

I’m facing the same issue on Windows 10 and even though “K. clear_session()” does solve, or at least alleviate, the issue it would be very nice to have a real fix instead of having to rely on workarounds.

Huii on Nov 22, 2019

@hotplot I mean :

import numpy as np
import keras
import keras.backend as K

X = np.random.rand(1, 224, 224, 3)

while True:
    model = keras.applications.mobilenet_v2.MobileNetV2()
    y = model.predict(X)[0]
    K.clear_session()

wangyexiang on Jul 25, 2019

For me, the following worked (for Tensorflow 2.3.1):

Convert numpy array to tf.tensor: input_tensor = tf.convert_to_tensor(input_ndarray)
Use the tensor directly as an argument to the model. output_tensor = model(input_tensor)
Convert the output tensor to numpy back if needed. output_array = output_tensor.numpy()

After I used the above lines instead of model.predict(), the memory leakage was resolved.

shuvoxcd01 on Dec 22, 2020

In tf 2.2.0 (conda-based) the problem still persists. K.clear_session() and gc.collect() and even using model.predict_on_batch together did not solve the problem. However, I was able to use model(tf.convert_to_tensor(np_input)), which decreased memory leakage drastically. Maybe this will help.

fazekaszs on Jul 23, 2020

for me change to a gpu with bigger memory size will just solve thie issue (gtx1650 4g -> p100 16g)

RainZhang1990 on Apr 24, 2020

This still happens in tf 2.9.1. It stops leaking for me if I do - tf.keras.backend.clear_session() gc.collect()

This is the second bug within a couple weeks I have found in Keras that has been around for years and was closed by @fchollet with no explanation. Not sure what that’s all about.

neilthefrobot on Sep 3, 2022

Though the tf.convert_to_tensor could avoid continuous memory growth, an extremely large ndarray would be still failed to convert_to_tensor. My solution is warping the array to a generator, and then convert it to a Dataset object by using tf.data.Dataset.from_generator. For example,

a = np.array([...])
def gen():
    for i in range(len(a)):
        yield a[i]
aDataset = tf.data.Dataset.from_generator(gen, output_signature=tf.TensorSpec(shape=(...), dtype=...))
Model.predict(aDataset)

Using the ‘args’ argument of Dataset.from_generator should be avoided, it would also convert the array to a tensor and would cause GPU memory leaking when handling an extremely large array.

ranyuyin on Oct 9, 2021

@moha23 exactly, the leak is still there, just smaller

fazekaszs on Aug 4, 2020

I’m also still facing the same issue with TensorFlow 2.1.0.

Huii on Mar 6, 2020

@wangyexiang Yes, I have run the following:

import numpy as np

import keras
import keras.backend as K

model = keras.applications.mobilenet_v2.MobileNetV2()
X = np.random.rand(1, 224, 224, 3)

while True:
    y = model.predict(X)[0]
    K.clear_session()

And get the following output:

Using TensorFlow backend.
2019-07-24 15:55:58.727337: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-07-24 15:55:58.727567: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 6. Tune using inter_op_parallelism_threads for best performance.
Traceback (most recent call last):
  File "/Users/sam/Desktop/test.py", line 11, in <module>
    y = model.predict(X)[0]
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/keras/engine/training.py", line 1169, in predict
    steps=steps)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 294, in predict_loop
    batch_outs = f(ins_batch)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2671, in _call
    session)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2623, in _make_callable
    callable_fn = session._make_callable_from_options(callable_opts)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1471, in _make_callable_from_options
    return BaseSession._Callable(self, callable_options)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1425, in __init__
    session._session, options_ptr, status)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tensor input_1:0, specified in either feed_devices or fetch_devices was not found in the Graph
Exception ignored in: <bound method BaseSession._Callable.__del__ of <tensorflow.python.client.session.BaseSession._Callable object at 0xb36096160>>
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1455, in __del__
    self._session._session, self._handle, status)
  File "/usr/local/anaconda3/envs/deeplearning/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: No such callable handle: 140702002450384

The exception is unsurprising. I am making repeated predictions using the existing model instance, so it is an error to call clear_session.

hotplot on Jul 24, 2019

@Huii @Shane-Neeley @AverinLV

Hey all, so I had upgraded to 2.1.0, and also switched from the Anaconda version to the Pip version. This fixed my issue. My best guess then is that Anaconda’s version is where the issue is. Now, I have no idea why these versions are different…or where we would go to report this.

felixvelariusbos on Mar 6, 2020

This was also happening to me using model.predict() in a loop. I’m on a mac running tf 2.0.0. I was able to fix it by re-saving my model without the optimizer: model.save("my_model.h5", include_optimizer=False) Then restart and use that saved model for predictions. ( Assuming you are only doing predictions )

Thanks @cclaan , your solution works for me. Specifically, my environment ties to TF 2.0.0 temporarily so I can’t just upgrade to 2.1.0 for the fix. I hope there’s a patch for TF 2.0 to include this bugfix.

ethanyanjiali on Jan 14, 2020

I have implemented a workaround based on predict_on_batch (thanks @novog for pointing this out). Note that predict_on_batch expects equally sized batches as used for training the model. Here’s an example of how to loop through your batches of test data to create predictions:

# custom batched prediction loop to avoid memory leak issues for now in the model.predict call
y_pred_probs = np.empty([len(X_test), VOCAB_SIZE], dtype=np.float32)  # pre-allocate required memory for array for efficiency

BATCH_INDICES = np.arange(start=0, stop=len(X_test), step=BATCH_SIZE)  # row indices of batches
BATCH_INDICES = np.append(BATCH_INDICES, len(X_test))  # add final batch_end row

for index in np.arange(len(BATCH_INDICES) - 1):
    batch_start = BATCH_INDICES[index]  # first row of the batch
    batch_end = BATCH_INDICES[index + 1]  # last row of the batch
    y_pred_probs[batch_start:batch_end] = model.predict_on_batch(X_test[batch_start:batch_end])

(Note that if only pre-allocating the results array already results in a MemoryError then simply the array does not fit in your available memory regardless of the memory leak issue.)

marnixkoops on Nov 11, 2019

If you have GPU: just use predict_on_batch() and batch your predictions (a.k.a. put all the images you need the prediction for in one set). NO LOOP NEEDED

On other systems: Using predict_on_batch() could be dangerous in terms of memory available, therefore do the same procedure as above with predict while specifying the batch_size

N.B.: predict() is not intended to be used in a loop, the only exception to this rule could be that the image you will predict is dependent from earlier results.

SH2282000 on Mar 10, 2023

For me calling gc.collect() after each model.predict() worked.

CptPirx on Oct 19, 2021

convert the numpy object to tf.convert_to_tensor works for me.
more info https://stackoverflow.com/questions/64199384/tf-keras-model-predict-results-in-memory-leak

ericyue on Sep 25, 2021

I created a workaround that should work in all cases of leak.

It’s a decorator that allows to run any function as a separated script, seamlessly. So when the script ends the memory that was allocated in the function is freed entirely. It automatically generates the script and takes care of passing the arguments and returns as long as they are pickleable or keras models or lists of keras models… (for documentation see: github)

You can install it with pip install scriptifier

It should look like this:

from scriptifier import scriptifier

def func_1(in):
    ...
    model.fit()
    ...
    return out

scriptified_func_1 = scriptifier.run_as_script(func_1)
out = scriptified_func_1(in)

benvigano on Jun 17, 2021

I also have the same issue using LSTM on the CPU. It keeps leaking memory continuously…

I would like to use Keras_tuner to tuner hyperparameters, but it’s impossible. I even have a minimal dataset as it is about high-fidelity simulations.

I am going to try using the docker image as suggested by @UntotaufUrlaub and let him know. My last chance!

TheRed86 on Jun 16, 2021

Same issue here on macOS catalina 10.15.6, python 3.8.5 and tensorflow (cpu) 2.3.0, not only does it leak with model.predict(x) but also with model.predict_on_batch(x).

gc.collect() seemed to work for a while but could not contain the leakage in the long run.

tf.convert_to_tensor helped a bit too.

On windows with tensorflow gpu 2.3.0 I have no problems at all. Once ram usage got as high as 2gb but dropped immediately and GPU was not affected.

Both versions installed with pip, I am using multiple threads, could that be related?

Uninstalling TF 2.3, installing TF-nightly 2.4 and reinstalling TF-2.3 seems to have fixed the issue 🤔.

santiagopardal on Aug 28, 2020

None of the work arounds here seem to work for my network except K.clear_session() (this is how I used it). While using model.predict caused sharp jump in RAM usage within a couple of minutes, model(inputs, training=False) has a much more gradual increase but increases nevertheless. Tf-gpu1.14. I think it could depend on the network architecture, I also get the topological sort error sometimes despite having no loops which seems to happen with more number of filters or some other unclear reason (issue #24816). So all these errors might be at play somehow.

moha23 on Aug 5, 2020

@xiahualiu
One way to avoid this. Try
model(feature_list)
instead of using model.predict(feature_list).
What do you mean exactly? How are you supposed to do that?

I tried:
from keras.models import load_model
my_model = load_model('hello.h5')
my_data = something()

my_model.predict(my_data) # works
my_model(my_data) # doesn't work
Second option triggers an error
ValueError: Unexpectedly found an instance of type `<class 'numpy.ndarray'>`.
Expected a symbolic tensor instance.
In my case, data is a NumPy array resembling this (truncated):
array([[[[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],
        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]]])
Should the Numpy array be directly converted to a Tensor?

@JivanRoquet it should work. I just tested. So each model itself is functional, you can call on it by just passing your input. But remember if you wanna do inference, do model(inputs, training=False) to disable all the dropout etc. I tried all the solutions above and this is the only way that doesn’t leak memory. idk why but just to share my two cents experience

ysyyork on Jul 31, 2020

@xiahualiu

One way to avoid this. Try
model(feature_list)
instead of using model.predict(feature_list).

What do you mean exactly? How are you supposed to do that?

I tried:

from keras.models import load_model
my_model = load_model('hello.h5')
my_data = something()

my_model.predict(my_data) # works
my_model(my_data) # doesn't work

Second option triggers an error

ValueError: Unexpectedly found an instance of type `<class 'numpy.ndarray'>`.
Expected a symbolic tensor instance.

In my case, data is a NumPy array resembling this (truncated):

array([[[[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],
        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]]])

Should the Numpy array be directly converted to a Tensor?

JivanRoquet on Apr 27, 2020

This was also happening to me using model.predict() in a loop. I’m on a mac running tf 2.0.0. I was able to fix it by re-saving my model without the optimizer: model.save("my_model.h5", include_optimizer=False) Then restart and use that saved model for predictions. ( Assuming you are only doing predictions )

Leak is still here despite doing exactly that. And it’s as massive as before.

JivanRoquet on Apr 27, 2020

Same with 2.1.0, 2.0.1, 2.0

AverinLV on Mar 6, 2020

For me on tf 2.0.0:

tf.keras.backend.clear_session() after predict helped the memory leak but didn’t fix it completely.
using predict_on_batch instead of predict fixed the memory leak, but really slowed down my predictions. I can’t use this.
upgraded to tf 2.1.0 … best solution for me, fixed memory leak without clear_session or predict_on_batch.

I haven’t tried:

@cclaan solution of using include_optimizer=False in model.save. What does this accomplish?

Shane-Neeley on Mar 6, 2020

Apparently, a similar issue has been solved in TensorFlow 2.1.0 (dev) according to this issue: https://github.com/tensorflow/tensorflow/issues/34579

I will wait for the final version of TensorFlow 2.1.0 and see whether the bug still persists in Keras.

Huii on Dec 4, 2019