tensorflow: High memory consumption with model.fit in TF 2.x

System information

Have I written custom code: Yes
OS Platform and Distribution: CentOS Linux 7
Mobile device: Not verified on mobile devices
TensorFlow installed from: binary, via pip install tf-nightly
TensorFlow version: 2.5.0-dev20200626
Python version: 3.6.8
CUDA/cuDNN version: 10.1 / 7
GPU model and memory: Tesla V100 32 GB

Describe the current behavior

Model training with the Keras API consumes high amount of system memory. It looks like the memory used by model.fit is proportional to the size of the training data provided as numpy arrays, with the proportionality constant being approximately 1. In other words, if the numpy arrays x and y are, say, 8 GB in total, then model.fit(x,y,...) will use another 8 GB (plus some overhead). So the memory usage by model.fit uses is twice the data size (plus some overhead).

The same concerns the validation data. If validation data are passed as numpy arrays to model.fit via the argument validation_data, then the memory use of model.fit seems to duplicate the size of the validation data arrays.

The described effect is also present if I wrap the numpy arrays containing the data in TF Datasets.

In the code attached below, one may change the variable K to vary the size of the data and test the above described behavior. It is straightforward to estimate the data size (e.g. with K=5000 the data arrays in the below code should be ca. 7.32 GB in total). The whole Python process associated with this code uses approximately twice this much RAM plus some overhead independent of the data size. One may comment out the line containing model.fit to check that it is the point at which the high memory consumption starts.

Describe the expected behavior

It would be reasonable to expect that the memory usage by the test code was approximately the data size plus some overhead independent of the data size (not twice the data size plus overhead).

A bit of history

This is a continuation of the issue #35030, concerning TF 2.0 and 2.1. I opened the latter issue in December 2019 and now @karmel have stated that that issue is very long and asked me to test if the issue persists in TF-nightly and open a new issue if necessary. So yes, the problem persists, and here I open a new issue.

The problem appeared first in the release 2.0.0-rc0. In the earlier releases up to 2.0.0-b1 inclusive the memery usage by the below test code was ca. the size of the data arrays plus an overhead independent of the data size. Starting from 2.0.0-rc0 it became twice the data size plus overhead and it was true at least until 2.1.0.

Next, in 2.2.0, the situation changed a bit:

When using numpy arrays to pass data to model.fit, there was a memory leak about 0.5 x data size in each epoch. In other words, if the size of the data arrays was ca. 8 GB, then the memory usage was increasing ca. 4 GB each epoch.
When wrapping the data arrays in TF datasets and then passing to model.fit, then the behavior was the same in TF 2.2 as in 2.1 and 2.0, namely the memory usage was twice the data size plus overhead.

Now, in the nightly release 2.5.0-dev20200626 we are back to the previous situation, namely the memory usage is twice the data size plus overhead, regardless of whether numpy arrays or datasets are used to pass the data to model.fit.

An important note on reproducibility

The issue has occurred to be not reproducible in colab! In #35030, I reported the issue for my local machine and some other participants also managed to reproduce it locally but not in colab. Some were trying to reproduce it in colab without success. Similarly, the results I report now are not from colab.

Also, for some reason the issue cannot be captured when using libmemusage.so to measure the memory usage. To capture the issue, I use ps au in Linux terminal or Python module psutil.

Standalone code to reproduce the issue

Since this issue is in fact a continuation of #35030, I use the same test code here.

import tensorflow as tf
import numpy as np

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Lambda, Conv2D

print("Tensorflow version: {}".format(tf.__version__),flush=True)

K = 5000 # Number of images
N = 512  # Image size

MAX_SIGNAL = 5000 # The values of the training data range from 0 to this

def build_model():
  '''Create a simple test model.'''
  
  inputs = Input((N,N,1))
  s = Lambda(lambda x: x / MAX_SIGNAL) (inputs)
  s = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(s)
  outputs = s

  return Model(inputs=[inputs], outputs=[outputs])

# Generate some random data
x_train = np.random.randint(MAX_SIGNAL+1,size=(K,N,N,1),dtype=np.uint16) # Should be 2 560 000 kB
y_train = np.random.randint(1+1         ,size=(K,N,N,1),dtype=np.bool)   # Should be 1 280 000 kB
x_val   = np.random.randint(MAX_SIGNAL+1,size=(K,N,N,1),dtype=np.uint16) # Should be 2 560 000 kB
y_val   = np.random.randint(1+1         ,size=(K,N,N,1),dtype=np.bool)   # Should be 1 280 000 kB
# In total, the above arrays should be 7 680 000 kB

model = build_model()

optimizer = tf.keras.optimizers.Adam()
loss = tf.keras.losses.BinaryCrossentropy()

model.compile(optimizer=optimizer, loss=loss)
model.fit(x=x_train, y=y_train, validation_data=(x_val,y_val), batch_size=8, epochs=10)

The above is meant to reproduce the issue with data passed to model.fit as numpy arrays. To test the behavior with TF datasets, replace the last line with the following:

ds_train = tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(8)
ds_val = tf.data.Dataset.from_tensor_slices((x_val,y_val)).batch(8)
model.fit(ds_train, validation_data=ds_val, epochs=10)

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 6
Comments: 22 (6 by maintainers)

Most upvoted comments

Ok, it was a longer moment, because both TF 2.5.0-rc0 and tf-nightly (2.6.0-dev20210412 at the moment) depend on libcusolver.so.11 while it seems that the latter is not available via APT (or at least I haven’t succeeded to install it). So eventually, I have just run the test code on my Kubuntu 18.04 machine in CPU mode only.

And I don’t have good news to share. The issue is there, both in 2.5.0-rc0 and in 2.6.0-dev20210412, and both for data passed to model.fit as numpy arrays and as tf.Dataset. The output is very similar to the one already posted in this issue, but for reference I attach it as text files.

TF-2.5.0-rc0_data_as_datasets.txt TF-2.5.0-rc0_data_as_numpy.txt TF-2.6.0-dev20210412_data_as_datasets.txt TF-2.6.0-dev20210412_data_as_numpy.txt

gdudziuk on Apr 12, 2021

I have just run the test with TF 2.4.1. Unfortunately, the issue is still there (again, for Kubuntu 18.04, GeForce GTX 1050 Ti 4GB). The results are very similar to those reported above for 2.4.0-dev20200702. I will check with tf-nightly in a moment…

gdudziuk on Apr 12, 2021

I guess I should carry out some memory measurements in my test code. So, here is the original test code enriched with RSS memory measurements:

import tensorflow as tf
import numpy as np
import psutil
import os

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Lambda, Conv2D
from tensorflow.keras.callbacks import Callback

print("Tensorflow version: {}".format(tf.__version__),flush=True)

K = 5000 # Number of images
N = 512  # Image size

MAX_SIGNAL = 5000 # The values of the training data range from 0 to this

class MemoryUsageCallback(Callback):
  '''Monitor memory usage on epoch begin and end.'''

  def on_epoch_begin(self,epoch,logs=None):
    print('**Epoch {}**'.format(epoch))
    print('Memory usage on epoch begin: {}'.format(psutil.Process(os.getpid()).memory_info().rss))

  def on_epoch_end(self,epoch,logs=None):
    print('Memory usage on epoch end:   {}'.format(psutil.Process(os.getpid()).memory_info().rss))
    
def build_model():
  '''Create a simple test model.'''
  
  inputs = Input((N,N,1))
  s = Lambda(lambda x: x / MAX_SIGNAL) (inputs)
  s = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(s)
  outputs = s

  return Model(inputs=[inputs], outputs=[outputs])

# Generate some random data
x_train = np.random.randint(MAX_SIGNAL+1,size=(K,N,N,1),dtype=np.uint16) # Should be 2 560 000 kB
y_train = np.random.randint(1+1         ,size=(K,N,N,1),dtype=np.bool)   # Should be 1 280 000 kB
x_val   = np.random.randint(MAX_SIGNAL+1,size=(K,N,N,1),dtype=np.uint16) # Should be 2 560 000 kB
y_val   = np.random.randint(1+1         ,size=(K,N,N,1),dtype=np.bool)   # Should be 1 280 000 kB
# In total, the above arrays should be 7 680 000 kB

model = build_model()

callbacks = [MemoryUsageCallback()]
optimizer = tf.keras.optimizers.Adam()
loss = tf.keras.losses.BinaryCrossentropy()

model.compile(optimizer=optimizer, loss=loss)
model.fit(x=x_train, y=y_train, validation_data=(x_val,y_val), batch_size=8, epochs=10, callbacks=callbacks, verbose=0)

to test the memory usage with TF datasets, replace the last line with the following:

ds_train = tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(8)
ds_val = tf.data.Dataset.from_tensor_slices((x_val,y_val)).batch(8)
model.fit(ds_train, validation_data=ds_val, batch_size=8, epochs=10, callbacks=callbacks, verbose=0)

gdudziuk on Jul 2, 2020