tensorflow: [TF 2.0] Can not compile model more than once without running out of memory

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 2.0.0-dev20190702
Python version: 3.6.7
Bazel version (if compiling from source): None
GCC/Compiler version (if compiling from source): None
CUDA/cuDNN version: V10.0.130, 7.3.1
GPU model and memory: Surface Book 1 Nvidia GPU

Describe the current behavior

Build a Keras model, compile it, run it. Try and rebuild model with new parameters. Result: OOM on GPU. Memory has not been freed or re-used.

Describe the expected behavior

Compile a model more than once without the GPU running out of memory. More specifically, be able to hyper-parameter tuning without restarting the Jupyter kernel.

I’ve tried:

set_memory_growth on the GPU
del model + gc.collect
clear_session none of them help.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

#Import basics and check everything works
import tensorflow as tf
from tensorflow import keras

AUTOTUNE = tf.data.experimental.AUTOTUNE

print("Versions:", tf.version.VERSION, tf.version.GIT_VERSION)
print("GPU availablilty:", tf.test.is_gpu_available())
print("Eager execution:", tf.executing_eagerly())

#Quick test
x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m))

def make_model(input_shape, n_hidden1=2049, n_hidden2=500, n_hidden3=180, batch_n_mom=0.99, dropout_rate=0.1):

    from tensorflow.keras.initializers import he_normal
    
    stacked_ae = keras.models.Sequential([
        keras.layers.Flatten(input_shape=input_shape),
        keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
        
        keras.layers.Dense(n_hidden1, activation="selu", name="he1", kernel_initializer=he_normal(seed=27)),
        keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
        keras.layers.Dropout(dropout_rate),
        
        keras.layers.Dense(n_hidden2, activation="selu", name="he2", kernel_initializer=he_normal(seed=42)),
        keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
        
        keras.layers.Dense(n_hidden3, activation="selu", name="he3", kernel_initializer=he_normal(seed=65)),
        keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
        
        keras.layers.Dense(n_hidden2, activation="selu", name="hd2", kernel_initializer=he_normal(seed=42)),
        keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
        
        keras.layers.Dense(n_hidden1, activation="selu", name="hd1", kernel_initializer=he_normal(seed=27)),
        keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
        keras.layers.Dropout(dropout_rate),
        
        keras.layers.Dense(input_shape[0] * input_shape[1], name="output", kernel_initializer=he_normal(seed=62)),
        keras.layers.Reshape(input_shape)
    ])
    
    return stacked_ae

import numpy as np

#Data doesn't matter
x_train = np.ones((32,60,80))
y_train = np.ones((32,60,80))

#Once runs ok
input_shape = [60,80]
ae_model = make_model(input_shape)
ae_model.compile(loss="mse",
                 optimizer=keras.optimizers.Adam(learning_rate=0.001, decay=1e-6),
                metrics=['accuracy'])
print(ae_model.summary())

#Do something with the model
history = ae_model.fit(x=x_train, y=y_train,  epochs=1, steps_per_epoch=1)

#Second run, new model
ae_model = make_model(input_shape, n_hidden1=2150)
ae_model.compile(loss="mse",
                 optimizer=keras.optimizers.Adam(learning_rate=0.001, decay=1e-6),
                metrics=['accuracy'])
print(ae_model.summary())

#Run again. GPU OOM.
history = ae_model.fit(x=x_train, y=y_train,  epochs=1, steps_per_epoch=1)

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

2019-07-16 16:24:06.019147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-07-16 16:24:12.775543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-16 16:24:12.789530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2019-07-16 16:24:12.799822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2019-07-16 16:24:12.813163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 517 MB memory) -> physical GPU (device: 0, name: GeForce GPU, pci bus id: 0000:01:00.0, compute capability: 5.0) 2019-07-16 16:24:12.847183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GPU major: 5 minor: 0 memoryClockRate(GHz): 0.993 pciBusID: 0000:01:00.0 2019-07-16 16:24:12.868383: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2019-07-16 16:24:12.887076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-07-16 16:24:12.902106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GPU major: 5 minor: 0 memoryClockRate(GHz): 0.993 pciBusID: 0000:01:00.0 2019-07-16 16:24:12.925257: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2019-07-16 16:24:12.946163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-07-16 16:24:12.958309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-16 16:24:12.977399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2019-07-16 16:24:12.988725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2019-07-16 16:24:13.001442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 517 MB memory) -> physical GPU (device: 0, name: GeForce GPU, pci bus id: 0000:01:00.0, compute capability: 5.0)

tensorflow: [TF 2.0] Can not compile model more than once without running out of memory

About this issue

Most upvoted comments