tensorflow: memory leak in tf.keras.Model.predict
https://stackoverflow.com/questions/64199384/tf-keras-model-predict-results-in-memory-leak
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- TensorFlow installed from (source or binary):
- TensorFlow version (use command below):
- Python version:
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version:
- GPU model and memory:
You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with:
- TF 1.0:
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" - TF 2.0:
python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
Describe the expected behavior
Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.
Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 25 (13 by maintainers)
Commits related to this issue
- Avoid calling Model.predict() on small numpy arrays in loops This is innefficient and can lead to memory leaks. See https://keras.io/api/models/model_training_apis/#predict-method and https://github.... — committed to nhuet/decomon by nhuet 8 months ago
- Add a predict method to DecomonModel for single batch numpy arrays The idea is to avoid calling predict() which is known to be not designed for small arrays, and leads to memory leaks when used in lo... — committed to nhuet/decomon by nhuet 8 months ago
- Avoid calling Model.predict() on small numpy arrays in loops This is innefficient and can lead to memory leaks. See https://keras.io/api/models/model_training_apis/#predict-method and https://github.... — committed to nhuet/decomon by nhuet 8 months ago
- Add a predict method to DecomonModel for single batch numpy arrays The idea is to avoid calling predict() which is known to be not designed for small arrays, and leads to memory leaks when used in lo... — committed to nhuet/decomon by nhuet 8 months ago
- Avoid calling Model.predict() on small numpy arrays in loops This is innefficient and can lead to memory leaks. See https://keras.io/api/models/model_training_apis/#predict-method and https://github.... — committed to nhuet/decomon by nhuet 8 months ago
- Add a predict method to DecomonModel for single batch numpy arrays The idea is to avoid calling predict() which is known to be not designed for small arrays, and leads to memory leaks when used in lo... — committed to nhuet/decomon by nhuet 8 months ago
- Avoid calling Model.predict() on small numpy arrays in loops This is innefficient and can lead to memory leaks. See https://keras.io/api/models/model_training_apis/#predict-method and https://github.... — committed to nhuet/decomon by nhuet 8 months ago
- Add a predict method to DecomonModel for single batch numpy arrays The idea is to avoid calling predict() which is known to be not designed for small arrays, and leads to memory leaks when used in lo... — committed to nhuet/decomon by nhuet 8 months ago
- Avoid calling Model.predict() on small numpy arrays in loops This is innefficient and can lead to memory leaks. See https://keras.io/api/models/model_training_apis/#predict-method and https://github.... — committed to nhuet/decomon by nhuet 8 months ago
- Add a predict method to DecomonModel for single batch numpy arrays The idea is to avoid calling predict() which is known to be not designed for small arrays, and leads to memory leaks when used in lo... — committed to nhuet/decomon by nhuet 8 months ago
- Avoid calling Model.predict() on small numpy arrays in loops This is innefficient and can lead to memory leaks. See https://keras.io/api/models/model_training_apis/#predict-method and https://github.... — committed to airbus/decomon by nhuet 8 months ago
- Add a predict method to DecomonModel for single batch numpy arrays The idea is to avoid calling predict() which is known to be not designed for small arrays, and leads to memory leaks when used in lo... — committed to airbus/decomon by nhuet 8 months ago
- Avoid calling Model.predict() on small numpy arrays in loops This is innefficient and can lead to memory leaks. See https://keras.io/api/models/model_training_apis/#predict-method and https://github.... — committed to ducoffeM/decomon by nhuet 8 months ago
- Add a predict method to DecomonModel for single batch numpy arrays The idea is to avoid calling predict() which is known to be not designed for small arrays, and leads to memory leaks when used in lo... — committed to ducoffeM/decomon by nhuet 8 months ago
- Avoid calling Model.predict() on small numpy arrays in loops This is innefficient and can lead to memory leaks. See https://keras.io/api/models/model_training_apis/#predict-method and https://github.... — committed to airbus/decomon by nhuet 8 months ago
- Add a predict method to DecomonModel for single batch numpy arrays The idea is to avoid calling predict() which is known to be not designed for small arrays, and leads to memory leaks when used in lo... — committed to airbus/decomon by nhuet 8 months ago
Lol, I’m fighting with the memory-leak problems in multiple TensorFlow service in PROD since years and implemented different things like watchers that check the memory usage to gracefully restart our workers before they OOM-crash in a job, and adding
tf.config.threading.set_inter_op_parallelism_threads(1); tf.config.threading.set_intra_op_parallelism_threads(1)to reduce the amount of leakage, etc.Just yesterday I finally discovered this. Maybe one can prevent future users like me from wasting so much time/energy on this by adjusting “What’s the difference between
Modelmethodspredict()and__call__()?” in the Keras FAQ, which currently recommends using the memory-leaking way of doing predictions:🙂
My attempts:
2.4.1: leak 2.7.1: leak
How the problem occurs to me?
How did I solve it?
@plooney
model.predictis a high-level API which is designed for batch-predicting outside of any loops. It automatically wraps your model into a tf.function and maintains graph based execution. Which means, if there is any change in input signature (shapeanddtype) to that function (here model.predict), then it traces multiple models instead of a single model as you are expecting.In your case,
inImmis a numpy input which is considered as different signature each time you provide it in a for loop to a function wrapped by tf.function. However, providinginImmas a tensor will result in same input signature and hence there is a single graph to which these inputs are fed and results are obtained. In the numpy case, there are 60 static graphs (which is not what you want). As there are many static graphs, the memory is increasing with each for loop iteration.When I added one line in your code, the code is not crashing. Please check the gist here. Thanks!
inImm=tf.convert_to_tensor(inImm)Please read 1, 2, 3, and 4. These resources will help you more. Thanks!
Please close the issue if this was resolved for you. Thanks!
In case this helps:
If the dataset can fit in memory, then the following functions can replace the call to
model.predict:Else, if the dataset does not fit in memory, then consider using
tf.data:So, random thoughts (I re-opened just to leave these)
as mentioned above,
predictshouldn’t be used in a loop. Sad things happen if you do. Ideally use model call w/ training=False directly (manually wrapping your model call in a tf.function if needed for performance reasons)Numpy inputs to predict/fit/evaluate get converted to tf.data datasets then iterated over. The specific conversion implementation in place currently ends up copying the data and is poorly suited for large input sizes (it is prone to ooms for large inputs). There’s a number of other github issues related to this floating around, but we have so far been unable to prioritize this in core Tensorflow. If you need performant numpy input to keras fit/evaluate/predict my current recommendation is using Tensorflow-io’s numpy inputs: https://www.tensorflow.org/io/api_docs/python/tfio/experimental/IODataset#from_numpy because it should be more performant and avoid excess memory copies.
The fact that gc.collect fixes this makes me think something about the numpy conversion is also creating cyclical references that the python gc doesn’t trigger for (the memory is consumed by the gpu which isn’t tracked by the python gc, so the python gc fails to trigger because it thinks there’s still plenty of memory). We’ve seen cyclical issues like this cause issues elsewhere (e.g. when creating multiple models), but due to the two aforementioned points we don’t have the bandwidth right now to prioritize tracking down & fixing this specific one.
I had the same issue with ‘.predict’, running on about 50,000 inputs over several hours, seen a leak of around 0.35 GB. Traced the leak back to the ‘.predict’ method. Tried replacing it with the ‘call’ method, which solved the memory leak but was slower by about 50%.
Switching to the
__call__function significantly reduced the amount of leakage, but still leaked about 70bytes per__call__in my environment. Finally, converting the keras model to a bare TensorFlow graph seems to have eliminated the leakage in my environment.@jvishnuvardhan thanks for the clear explanation. If this were calling a
tf.functionin a loop that would be 100% the correct answer. ButModel.predictmanages some of this to avoid this problem (in general keras fit/evaluate/predict never require that the user convert inputs to tensors). It looks like something more complicated is happening.The first two clues that suggest it are:
Investigating a little farther you can find the
model.predict_functionis the@tf.functionthat runs in here. Inspecting that, both it’s._list_all_concrete_functions()and.pretty_printed_concrete_signatures()show that there is only one graph, and predict is handling the conversion of the numpy array to a Tensor.So I agree that this is leaking memory somewhere. But I’ve confirmed that it’s not the
tf.functioncache causing it.@tomerk, you’re pretty familiar with this code, do you have any ideas?