tensorflow: Memory leak in forward pass (e.g., of ResNet50 model) with TensorFlow 2.12.0 and Python 3.11

The following minimal example reproduces the memory leak I ran into. (No GPU, just CPU.)

memleak.py:

import numpy as np
import psutil
import tensorflow as tf

model = tf.keras.applications.ResNet50()  # VGG19 seems to not leak.

# tf.config.threading.set_inter_op_parallelism_threads(0) and tf.config.threading.set_intra_op_parallelism_threads(0) do not help.

inp = (np.random.rand(1, 224, 224, 3) * 255).astype('uint8')

for run in range(1, 9999999):
    model(inp)
    memory_usage_in_MiB = psutil.Process().memory_info().rss / (1024 * 1024)
    print(f'Memory usage after {run} run(s) (in MiB): {memory_usage_in_MiB:.3f}', flush=True)

Dockerfile:

FROM python:3.11.2

RUN pip install --no-cache-dir tensorflow==2.12.0 psutil==5.9.4

# Disable the Docker cache from this stage on, see https://stackoverflow.com/a/58801213/1866775
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache

ADD ./memleak.py /
RUN python /memleak.py

Output (docker build --rm .):

Memory usage after 1 run(s) (in MiB): 604.324
Memory usage after 2 run(s) (in MiB): 606.906
Memory usage after 3 run(s) (in MiB): 606.906
Memory usage after 4 run(s) (in MiB): 606.906
Memory usage after 5 run(s) (in MiB): 606.906
Memory usage after 6 run(s) (in MiB): 607.164
Memory usage after 7 run(s) (in MiB): 607.164
Memory usage after 8 run(s) (in MiB): 607.164
Memory usage after 9 run(s) (in MiB): 607.164
Memory usage after 10 run(s) (in MiB): 607.164
Memory usage after 11 run(s) (in MiB): 607.422
Memory usage after 12 run(s) (in MiB): 607.422
[...]
Memory usage after 498 run(s) (in MiB): 626.242
Memory usage after 499 run(s) (in MiB): 626.242
Memory usage after 500 run(s) (in MiB): 626.242
Memory usage after 501 run(s) (in MiB): 626.500
Memory usage after 502 run(s) (in MiB): 626.500
[...]
[...]
Memory usage after 1996 run(s) (in MiB): 683.477
Memory usage after 1997 run(s) (in MiB): 683.734
Memory usage after 1998 run(s) (in MiB): 683.734
Memory usage after 1999 run(s) (in MiB): 683.734
Memory usage after 2000 run(s) (in MiB): 683.734
Memory usage after 2001 run(s) (in MiB): 683.734
[...]
Memory usage after 9996 run(s) (in MiB): 960.258
Memory usage after 9997 run(s) (in MiB): 960.508
Memory usage after 9998 run(s) (in MiB): 960.508
Memory usage after 9999 run(s) (in MiB): 960.508
Memory usage after 10000 run(s) (in MiB): 960.508
Memory usage after 10001 run(s) (in MiB): 960.508
[...]
Memory usage after 24997 run(s) (in MiB): 1547.840
Memory usage after 24998 run(s) (in MiB): 1547.840
Memory usage after 24999 run(s) (in MiB): 1534.230
Memory usage after 25000 run(s) (in MiB): 1532.348
Memory usage after 25001 run(s) (in MiB): 1533.441
Memory usage after 25002 run(s) (in MiB): 1544.711
[...]

When using the same TensorFlow version (2.12.0) but with Python 3.10.10 (instead of 3.11.2), the memory usage does not grow.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 4
  • Comments: 17 (2 by maintainers)

Most upvoted comments

Any news on this ? The issue is still present in tf-nighly and not specific to keras

Dockerfile

FROM python:3.10.11
#FROM python:3.11.2

RUN pip install --no-cache-dir tf-nightly-cpu psutil==5.9.4

# Disable the Docker cache from this stage on, see https://stackoverflow.com/a/58801213/1866775
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache

ADD ./memleak.py /
RUN python /memleak.py

memleak.py:

import os, psutil
process = psutil.Process()
print(process.memory_info().rss / 1000000)  # Ram usage in MB

import tensorflow as tf
print(tf.__version__)

while True:
    print(process.memory_info().rss / 1000000, flush=True)
    x = tf.random.normal(shape=(1,))  # Causing growing RAM

I have replicated the issue on Colab also with tf-nightly version and with python 3.9.16.

Please refer the attached gist for details.