tensorflow: Memory leak in Conv2D/Activation on GPU

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): Binary, the standard docker distribution
  • TensorFlow version (use command below): v2.4.0-rc4-71-g582c8d236cb 2.4.0
  • Python version: 3.6.9
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 11.0
  • GPU model and memory: GeForce RTX 2070, 8GB

Describe the current behavior I upgraded to TF 2.4.0 from TF 2.1.2, and training a very simple convolutional network, which worked fine in 2.1.2, started running out of memory during training. I distilled a simple reproducible example that demonstrates the issue. Each training epoch consumes about 50MB of additional memory and, given enough epochs, it grows to infinity (or 32 GB in my case). It only occurs on GPU, the same thing runs fine on CPU.

Describe the expected behavior Memory not growing, or growing only very little

Standalone code to reproduce the issue

import gc
import os
import psutil
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Conv2D, Flatten, BatchNormalization, Activation

physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)


input_tensor = tf.keras.layers.Input(shape=(512,64,1))

x = Conv2D(filters=32, kernel_size=(5,5), strides=(2,2), padding='same')(input_tensor)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)

x = Conv2D(filters=64, kernel_size=(4,4), strides=(2,2), padding='same')(x)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)

x = Conv2D(filters=128, kernel_size=(4,4), strides=(2,1), padding='same')(x)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)

x = Conv2D(filters=128, kernel_size=(4,4), strides=(2,1), padding='same')(x)
# Commented out on purpose - see Note 1 below
# x = BatchNormalization()(x)
x = Activation('relu')(x)

x = Flatten()(x)

x = Dense(5, activation='sigmoid')(x)

model = tf.keras.Model(inputs=input_tensor, outputs=x)


train_x = np.random.random((2048, 512, 64, 1))
train_y = np.random.random((2048, 5))

model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam())

process = psutil.Process(os.getpid())

for i in range(50):
    model.fit(train_x, train_y, epochs=1, batch_size=32, verbose=0)
    gc.collect()
    print(i, process.memory_info().rss // 1000000)

Note 1 Now, if you uncomment the BatchNormalization() layers creation, the memory problem disappears. So, it is somehow caused by the Activation layer following immediately the Conv2D

Note 2 The memory problem also occurs if I train multiple epochs in a single fit() call, such as

model.fit(train_x, train_y, epochs=50, batch_size=32)

I used the for loop only to be able to call garbage collection and print the memory.

Note 3 A Conv2D layer with activation embedded in it, such as

Conv2D(filters=128, kernel_size=(4,4), strides=(2,1), padding='same', activation='relu')

also causes the memory issue

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 2
  • Comments: 26 (3 by maintainers)

Most upvoted comments

I temporarily fixed the issue by replacing ReLU() with lambda x : tf.math.maximum(x, 0.0).

I have same issue in windows. cuda11, cudnn8 and libtensorflow 2.3.1 which compiled by me. I replaced ReLU to LeakyReLU(0.001) after that, no more leaks. Same issue in tf 2.3.1, so It could be a bug of CUDA11 or cuDNN8.

Sorry for the off-topic but

memory seem to increase for every epoch with 500mb increment used leaky RELU

Leaky ReLU - now 20% more leaky!

@Saduf2019

As @zlee406 says, your Colab notebook is running on CUDA V10.1.243. Me and @zlee406 are running on CUDA 11 (V11.0.221 in my case), as required by the install guide: https://www.tensorflow.org/install/gpu

This is how my memory grows when I run the sample code: 0 4194 1 4280 2 4376 3 4439 4 4530 5 4583 6 4614 7 4670 8 4731 9 4773 10 4802 11 4871 12 4920 13 4960 14 5016 15 5100 16 5159 17 5228 18 5289 19 5367 20 5422 21 5469 22 5526 23 5561 24 5598 25 5659 26 5726 27 5786 28 5846 29 5873 30 5959 31 6008 32 6065 33 6155 34 6198 35 6252 36 6287 37 6338 38 6412 39 6456 40 6546 41 6598 42 6653 43 6730 44 6783 45 6800 46 6874 47 6926 48 6967 49 7064

I had this exact issue yesterday with TF 2.8.2 running on Google Colab. My model was defined as:

from tensorflow.keras import Model
from tensorflow.keras.layers import (
    Conv2D,
    Dense,
    Dropout,
    Flatten,
    Input,
    MaxPool2D,
    ReLU
)

inputs = Input(shape=(280, 200, 1))
x = Conv2D(
    32,
    (5, 5),
    padding='same',
    dtype="mixed_float16",
)(inputs)
x = ReLU()(x) # or LeakyReLU
# --------------------------------------------------------------------------------------
x = MaxPool2D((2, 2), strides=(2, 2), dtype="mixed_float16")(x) # <- this saved my day!
# --------------------------------------------------------------------------------------
x = Conv2D(
    64,
    (5, 5),
    padding='same',
    dtype="mixed_float16",
)(x)
x = ReLU()(x) # or LeakyReLU
x = MaxPool2D((2, 2), strides=(2, 2), dtype="mixed_float16")(x)
x = Dropout(0.5, dtype="mixed_float16")(x)
x = Flatten(dtype="mixed_float16")(x)
x = Dense(512, dtype="mixed_float16")(x)
x = ReLU()(x) # or LeakyReLU
x = Dropout(0.5, dtype="mixed_float16")(x)

outputs = Dense(1, dtype="float32")(x)

model = Model(inputs=inputs, outputs=outputs)

What happened to me is that I was missing that intermediate MaxPool2D layer between the two Conv2D layers. Without it, memory usage was exploding in the second training epoch, causing the training to fail. My runtime has 51GB of CPU RAM memory, and my GPU has 16GB RAM (that’s Google Colab Pro+). I wasn’t able to determine whether the CPU or the GPU memory was exhausted, but what puzzled me the most is that the first training epoch always finished successfully without any issues - the issue only happened at the beginning of the second epoch.

I tried everything you recommended in this thread: separating the activation layer, replacing ReLU with LeakyReLU and even adding a CleanUpCallback that did K.clear_session() and gc.collect() at the end of each epoch (see e.g. the Medium post “Dealing with memory leak issue in Keras model training”).

My datasets are generated with tf.data.Dataset.from_generator with a batch size of 1024 and prefetched with AUTOTUNE. The dataset consists of pairs (image, y) where image has shape (280, 200, 1) and y is a float scalar. The total dataset contains ~40,000 pairs, but I had this issue even if I selected e.g. just 1% of the data. Disabling prefetching or reducing the batch size to e.g. 2 didn’t work either. I also tried removing all callbacks, metrics, and validation during training, but the memory leak persisted.

The only thing that worked for me was adding the MaxPool2D layer between the two Conv2D layers. Now the GPU memory usage stays constant at ~90% and CPU memory usage stays at ~8GB. I am not sure why this works (I would love it if any of you could clarify this), but I hope that this may be of help to someone in the future.

Sorry for the off-topic but

memory seem to increase for every epoch with 500mb increment used leaky RELU

Leaky ReLU - now 20% more leaky!

well for me there is no leak in ReLU as well as Leaky ReLU, the problem lies with model fit when evaluating the validation dataset No leaks if the validation dataset is not included in the model fit so I had to independently run fit and evaluate No leaks for now and runs fine for now.

Using LeakyReLU instead of relu saved my day, continuous host memory increase is gone.

EDIT: It also worked for me to give my convolutional layers a None activation and use a tf.keras.layers.ReLU() layer after each of them.

I experience the same on 2.4.1 with RTX 3090. Replaced relu with elu as a temporary fix.

@jan-x-marek, Thanks for saving my time.

I have the same issue with cuda 11 and tensorflow >= 2.4.0. Replaced relu with elu and memory leak was gone. What it also fixed for me: Swapping my Google cloud VM from a Nvidia Tesla V100 to a Nvidia P100 WITHOUT replacing the ReLU. So ReLU + Nvidia P100 works fine, ReLU + Nvidia Tesla V100 results in memory leak.

I have encountered the same problem and was able to create a slightly smaller example to reproduce it. If left running for long enough the RAM usage increases until swapping causes the GPU utilization to drop.

import tensorflow as tf

gpu = tf.config.experimental.list_physical_devices('GPU')[0]
tf.config.experimental.set_memory_growth(gpu, True)

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(128, 3, 1, 'same', activation='relu')
])

dataset = tf.random.normal((1024, 32, 32, 128))

model.compile(loss='mse')

model.fit(dataset, dataset, 8, epochs=1000)