tensorflow: High memory consumption with model.fit in TF 2.0.0 and 2.1.0-rc0

System information

  • Have I written custom code: Yes
  • OS Platform and Distribution: Linux Kubuntu 18.04, kernel 5.0
  • Mobile device: Not verified on mobile devices
  • TensorFlow installed from: binary via pip install tensorflow-gpu
  • TensorFlow version: 2.1.0-rc0, however affected are also 2.0.0 and 2.0.0-rc0, 2.0.0-rc1, 2.0.0-rc2
  • Python version: 3.6.9
  • CUDA version: 10.1 for TF 2.1.0-rc0; 10.0 for the earlier versions of TF
  • cuDNN version: 7
  • GPU model and memory: Nvidia GeForce GTX 1050 Ti (4GB)
  • CPU model: AMD Ryzen 7 1700

Describe the current behavior

Model training with the Keras API consumes high amount of system memory with TF 2.0.0 and 2.1.0-rc0, as well as in 2.0.0-rc0, 2.0.0-rc1 and 2.0.0-rc2. It looks like the memory used by model.fit is proportional to the size of the training data provided as numpy arrays, with the proportionality constant being approximately 1. In other words, if the numpy arrays x and y are, say, 8 GB in total, then model.fit(x,y,...) will use another 8 GB (plus some overhead). This may suggest that model.fit creates unnecessary copies of the data arrays. This is in contrary to TF 1.14.0, 2.0.0-a0, 2.0.0-b0 and 2.0.0-b1, where model.fit seems to use some amount of RAM independent of the data size (and much less than 8 GB, at least in the test code attached below).

The same concerns the validation data. If validation data are passed as numpy arrays to model.fit via the argument validation_data, then the memory use of model.fit seems to duplicate the size of the validation data arrays with TF from 2.0.0-rc0 to 2.1.0-rc0.

In the code attached below, one may change the variable K to vary the size of the data and test the above described behaviour. It is straightforward to estimate the data size: e.g. with K=5000 the data arrays in the below code should be ca. 7.32 GB in total. The whole Python process associated with this code uses approximately this much RAM plus some overhead when running with TF 1.14.0, 2.0.0-a0, 2.0.0-b0 or 2.0.0-b1. But with TF from 2.0.0-rc0 to 2.1.0-rc0 the Python process consumes twice that much RAM. One may comment out the line containing model.fit to check that it is the point at which the high memory consumption starts.

Describe the expected behavior

The size of the memory used by model.fit should not duplicate the size of the training and validation data passed as numpy arrays. It should be more or less independent of the size of the data arrays, similarly as in TF 1.14.0 and in the pre-releases 2.0.0-a0, 2.0.0-b0 and 2.0.0-b1.

Code to reproduce the issue

import tensorflow as tf
import numpy as np

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Lambda, Conv2D

print("Tensorflow version: {}".format(tf.__version__),flush=True)

K = 5000 # Number of images
N = 512  # Image size

MAX_SIGNAL = 5000 # The values of the training data range from 0 to this

def build_model():
  '''Create a simple test model.'''
  
  inputs = Input((N,N,1))
  s = Lambda(lambda x: x / MAX_SIGNAL) (inputs)
  s = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(s)
  outputs = s

  return Model(inputs=[inputs], outputs=[outputs])

# Generate some random data
x_train = np.random.randint(MAX_SIGNAL+1,size=(K,N,N,1),dtype=np.uint16) # Should be 2 560 000 kB
y_train = np.random.randint(1+1         ,size=(K,N,N,1),dtype=np.bool)   # Should be 1 280 000 kB
x_val   = np.random.randint(MAX_SIGNAL+1,size=(K,N,N,1),dtype=np.uint16) # Should be 2 560 000 kB
y_val   = np.random.randint(1+1         ,size=(K,N,N,1),dtype=np.bool)   # Should be 1 280 000 kB
# In total, the above arrays should be 7 680 000 kB

model = build_model()

optimizer = tf.keras.optimizers.Adam()
loss = tf.keras.losses.BinaryCrossentropy()

model.compile(optimizer=optimizer, loss=loss)
model.fit(x=x_train, y=y_train, validation_data=(x_val,y_val), batch_size=8, epochs=10)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 17
  • Comments: 44 (8 by maintainers)

Commits related to this issue

Most upvoted comments

An update on the problem: Lambda layer has no memory usage issue, and it looks like memory_profiler has a wrong mapping between line # and memory usage/increment. When inserting pdb.set_trace() before model.fit(), htop (or top) shows no memory increase after model = build_model(), but it increases dramatically in model.fit:

  model = build_model()

  optimizer = tf.keras.optimizers.Adam()
  loss = tf.keras.losses.BinaryCrossentropy()

  model.compile(optimizer=optimizer, loss=loss)
  import pdb
  pdb.set_trace()
  model.fit(x=x_train, y=y_train, validation_data=(x_val,y_val), batch_size=64, epochs=1, steps_per_epoch=10)

The root cause is from the way TF handles how to convert numpy array to tensors. If you run the following code:

x_train = np.random.randint(MAX_SIGNAL+1,size=(K,N,N,1),dtype=np.uint16)  # Should be 2 560 000 kB
y = tf.data.Dataset.from_tensor_slices(x_train)

You will see an increase of 2.5GB on the memory usage (which is almost the dataset size). Even for yy = tf.convert_to_tensor(x_train), the memory usage is increased too.

We are working on the fix. Please stay tuned.

I have the same issue here and found out that it happens when we provide validation data. if you remove validation data from fit() then memory usage will remain constant and without any increaing. I also noticed that if we provide validation data, after each epoch, memory usage goes up, but after 10 epochs a portion of it gets cleared and again it will go up until the next 10th epoch. these steps will go on until ram is full and training crashes. I should also mention I am using tf2.2

Not exactly. The issue is with the system RAM, not with GPU RAM.

I can see that in both of your gists you get similar usage of system RAM: 9.8 GB for 2.0.0-beta1 and 10.2 GB for 2.1.0-rc1. Yes, this is the expected behavior. Said that, I cannot reproduce this expected behavior on my workstation. I obtain 9.6 GB for 2.0.0-beta1 and 18.7 GB for 2.1.0-rc1. The latter is twice that much as the former.

I have just discovered that TF 2.1.0-rc1 was released today. Unfortunately, the issue persists in 2.1.0-rc1.

I am also experiencing continually increasing memory usage with tensorflow 2.1.0 and 2.0.0 I’m not using Lambda layers - just Dense and Dropout layers. Adding calls to del model, tf.keras.backend.clear_session() and gc.collect() slows the rate of growth, but it still grows endlessly.

I’m using Python 3.6.9 on Linux Mint 19.1 Like others, I’ve been unable to reproduce the issue in colab.

I’ve been able to reproduce the issue locally with the following packages: tensorflow-2.1.0 tensorflow-cpu-2.1.0 tensorflow-gpu-2.1.0 tensorflow-2.0.0 tensorflow-1.15.0

tensorflow-2 1 0

This package gives stable memory usage: tensorflow-cpu-1.15.0

tensorflow-cpu-1 15 0

Here’s the code I used to test:

import sys
import random
import numpy as np
import tensorflow as tf
import gc
import memory_profiler

@profile
def create_neural_network_model(input_size, output_size):

    print("Creating model: input_size={}, output_size={}".format(input_size, output_size))

    model = tf.keras.models.Sequential([
        tf.keras.layers.Input(shape=(input_size,)),
        tf.keras.layers.Dense(1024, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(output_size),
    ])

    model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

    return model


@profile
def train_model():

    input_size = 80 * 24
    output_size = 60

    print("Generating data.....")
    # simulate 100 moves
    inputs = np.random.uniform(size=(100, input_size))
    outputs = np.random.random_integers(low=0, high=output_size-1, size=(100,))

    print("Training model.....")

    tf.keras.backend.clear_session()
    gc.collect()

    model = create_neural_network_model(input_size, output_size)
    
    model.fit(x=inputs, y=outputs, epochs=3)
    return model


@profile
def main():
    model = None
    for i in range(50):
        print("Iteration {}...".format(i+1))
        del model
        model = train_model()
        
main()

Excuse me for the late answer. I’m back to the topic now.

First, let me acknowledge your effort in digging into the issue which I appreciate. But unfortunately, I cannot confirm the issue is fixed.

To reiterate briefly, the original problem was as follows. It would be reasonable to expect that the memory usage by the test code for this issue is ca. the size of the data arrays (plus some overhead). But in TF 2.0 and 2.1 the memory usage was twice the size of the data arrays (plus overhead). Hence this issue.

The current state is as follows:

  • In TF 2.2 things indeed have changed albeit it’s hard to call it fixed. Now, with the test code posted in this issue, I have a memory leak about 0.5 x data size in each epoch. In other words, if the size of the data arrays is ca. 8 GB, then the memory usage increases ca. 4 GB each epoch.

  • If I wrapp the numpy data arrays in datasets:

    ds_train = tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(8)
    ds_val = tf.data.Dataset.from_tensor_slices((x_val,y_val)).batch(8)
    ...
    model.fit(ds_train, validation_data=ds_val, epochs=10)
    

    then the behavior is the same in TF 2.2 as in 2.1 and 2.0, namely the memory usage is twice the data size plus overhead.

Can you confirm these results, @yhliang2018, @mihaimaruseac?

Coming late to the issue.

Concerning 2.1.0-rc1, it turned out that the allegedly non-GPU pip package (tensorflow==2.1.0-rc1) is actually distributed with GPU support (similarly to tensorflow-gpu==2.1.0-rc1).

That is expected. 1.15 and 2.1 and later will have a single pip package for both CPU and GPU builds. If you want only the cpu pip, you should install tensorflow-cpu. This is documented in the release notes

I’m going to try some bisection on nightly, provided I can reproduce the issue

No problem. I have just reinstalled with pip install --no-cache-dir tensorflow-gpu==2.1.0-rc1. The issue is still present. I have also tried tf-nightly (version 2.1.0-dev20191219) and the results are the same as in 2.1.0-rc1.

The issue apparently is system-dependent. So, to exclude a few possibilities I have also run tests with some non-GPU versions of Tensorflow (1.14.0, 2.0.0-b1, 2.0.0-rc0 and 2.0.0). My earlier results concerning RAM usage has been reproduced: for 1.14.0 and 2.0.0-b1 I have observed the expected behavior while for 2.0.0-rc0 and 2.0.0 the RAM usage was two times higher.

Concerning 2.1.0-rc1, it turned out that the allegedly non-GPU pip package (tensorflow==2.1.0-rc1) is actually distributed with GPU support (similarly to tensorflow-gpu==2.1.0-rc1). So for this version, I have manually switched off the GPU support by adding the following lines:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

Result: the memory usage for 2.1.0-rc1 with GPU disabled by this line is as with enabled GPU.

Please let me know what further diagnostic info could be helpful for you. I will try to provide all necessary details.