tensorflow: [TF 2.0] Can not compile model more than once without running out of memory
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): 2.0.0-dev20190702
- Python version: 3.6.7
- Bazel version (if compiling from source): None
- GCC/Compiler version (if compiling from source): None
- CUDA/cuDNN version: V10.0.130, 7.3.1
- GPU model and memory: Surface Book 1 Nvidia GPU
Describe the current behavior
Build a Keras model, compile it, run it. Try and rebuild model with new parameters. Result: OOM on GPU. Memory has not been freed or re-used.
Describe the expected behavior
Compile a model more than once without the GPU running out of memory. More specifically, be able to hyper-parameter tuning without restarting the Jupyter kernel.
I’ve tried:
- set_memory_growth on the GPU
- del model + gc.collect
- clear_session none of them help.
Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.
#Import basics and check everything works
import tensorflow as tf
from tensorflow import keras
AUTOTUNE = tf.data.experimental.AUTOTUNE
print("Versions:", tf.version.VERSION, tf.version.GIT_VERSION)
print("GPU availablilty:", tf.test.is_gpu_available())
print("Eager execution:", tf.executing_eagerly())
#Quick test
x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m))
def make_model(input_shape, n_hidden1=2049, n_hidden2=500, n_hidden3=180, batch_n_mom=0.99, dropout_rate=0.1):
from tensorflow.keras.initializers import he_normal
stacked_ae = keras.models.Sequential([
keras.layers.Flatten(input_shape=input_shape),
keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
keras.layers.Dense(n_hidden1, activation="selu", name="he1", kernel_initializer=he_normal(seed=27)),
keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
keras.layers.Dropout(dropout_rate),
keras.layers.Dense(n_hidden2, activation="selu", name="he2", kernel_initializer=he_normal(seed=42)),
keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
keras.layers.Dense(n_hidden3, activation="selu", name="he3", kernel_initializer=he_normal(seed=65)),
keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
keras.layers.Dense(n_hidden2, activation="selu", name="hd2", kernel_initializer=he_normal(seed=42)),
keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
keras.layers.Dense(n_hidden1, activation="selu", name="hd1", kernel_initializer=he_normal(seed=27)),
keras.layers.BatchNormalization(axis=1, momentum=batch_n_mom),
keras.layers.Dropout(dropout_rate),
keras.layers.Dense(input_shape[0] * input_shape[1], name="output", kernel_initializer=he_normal(seed=62)),
keras.layers.Reshape(input_shape)
])
return stacked_ae
import numpy as np
#Data doesn't matter
x_train = np.ones((32,60,80))
y_train = np.ones((32,60,80))
#Once runs ok
input_shape = [60,80]
ae_model = make_model(input_shape)
ae_model.compile(loss="mse",
optimizer=keras.optimizers.Adam(learning_rate=0.001, decay=1e-6),
metrics=['accuracy'])
print(ae_model.summary())
#Do something with the model
history = ae_model.fit(x=x_train, y=y_train, epochs=1, steps_per_epoch=1)
#Second run, new model
ae_model = make_model(input_shape, n_hidden1=2150)
ae_model.compile(loss="mse",
optimizer=keras.optimizers.Adam(learning_rate=0.001, decay=1e-6),
metrics=['accuracy'])
print(ae_model.summary())
#Run again. GPU OOM.
history = ae_model.fit(x=x_train, y=y_train, epochs=1, steps_per_epoch=1)
Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
2019-07-16 16:24:06.019147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-07-16 16:24:12.775543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-16 16:24:12.789530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2019-07-16 16:24:12.799822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2019-07-16 16:24:12.813163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 517 MB memory) -> physical GPU (device: 0, name: GeForce GPU, pci bus id: 0000:01:00.0, compute capability: 5.0) 2019-07-16 16:24:12.847183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GPU major: 5 minor: 0 memoryClockRate(GHz): 0.993 pciBusID: 0000:01:00.0 2019-07-16 16:24:12.868383: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2019-07-16 16:24:12.887076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-07-16 16:24:12.902106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GPU major: 5 minor: 0 memoryClockRate(GHz): 0.993 pciBusID: 0000:01:00.0 2019-07-16 16:24:12.925257: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2019-07-16 16:24:12.946163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-07-16 16:24:12.958309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-16 16:24:12.977399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2019-07-16 16:24:12.988725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2019-07-16 16:24:13.001442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 517 MB memory) -> physical GPU (device: 0, name: GeForce GPU, pci bus id: 0000:01:00.0, compute capability: 5.0)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (9 by maintainers)
This is primarily a keras question. Assigning it to @karmel