tensorflow: Memory leak in model.fit
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
-
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): minimal working example
-
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): windows-server 2016
-
TensorFlow installed from (source or binary): conda
-
TensorFlow version (use command below): tf 2.1.0
-
Python version: 3.7
-
CUDA/cuDNN version: CUDA 10.1
-
GPU model and memory: K80 - 24Gb
Describe the current behavior memory use increases with consecutive training runs, probably related to #35524, #33030, #35124, #35835 side note: I do not understand the warning but this seems to be handled in #37500 Describe the expected behavior memory should stay constant
Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.
from tensorflow.keras.datasets import cifar10
import tensorflow.keras.callbacks as callbacks
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import LearningRateScheduler
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Input, Conv2D, GlobalAveragePooling2D, Activation, Dense
from tensorflow.keras.models import Model
import tensorflow.keras.utils as kutils
import numpy as np
import psutil
import gc
batch_size = 128
epochs = 5
num_classes = 10
def buildmodel():
img_input = Input(shape=(32, 32, 3))
x = Conv2D(16, (3, 3), padding='same')(img_input)
x = Activation("relu")(x)
x = Conv2D(16, (3, 3), padding='same')(x)
x = Activation("relu")(x)
x = GlobalAveragePooling2D()(x)
prediction = Dense(num_classes,activation='softmax', name = 'classifier') (x)
model = Model(inputs=img_input, outputs=prediction)
return model
(trainX, trainY), (testX, testY) = cifar10.load_data()
mean = np.mean(trainX, axis=0)
std = np.std(trainX)
trainX = trainX.astype('float32')
trainX = (trainX - mean) / std
testX = testX.astype('float32')
testX = (testX - mean) / std
trainY = kutils.to_categorical(trainY)
testY = kutils.to_categorical(testY)
generator = ImageDataGenerator()
generator.fit(trainX)
val_generator = ImageDataGenerator()
for i in range(10):
tf.keras.backend.clear_session()
model = buildmodel()
sgd = SGD(lr=0.1, momentum=0.9, nesterov=True)
model.compile(loss="categorical_crossentropy", optimizer=sgd, metrics=["acc"])
model.fit(generator.flow(trainX, trainY, batch_size=batch_size), epochs=epochs, validation_data=val_generator.flow(testX, testY, batch_size = batch_size),verbose=0, workers = 20)
print('memory usesd: ' + str(psutil.virtual_memory().used // 1e6))
gc.collect()
output:
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
memory usesd: 38012.0
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
memory usesd: 38563.0
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
memory usesd: 39288.0
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
memory usesd: 40005.0
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
memory usesd: 40730.0
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
memory usesd: 41490.0
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
memory usesd: 42216.0
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
memory usesd: 42937.0
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
memory usesd: 43659.0
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
WARNING:tensorflow:sample_weight modes were coerced from
…
to
[‘…’]
memory usesd: 44403.0
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 8
- Comments: 42 (9 by maintainers)
Hey, I was having the same issue. My current workaround is to save the model before deleting it and clearing the backend after each training iteration. Then I reload the model before calling fit again. I’m no longer experiencing the memory leak with this workaround.
@krenerd unfortunately, I tried your code but memory leak still exists
@jvishnuvardhan Thanks for your help and great work, memory is constant with tf-nightly! As a side note: When I remove either
gc.collect()ortf.keras.backend.clear_session()memory is leaking, so both commands are needed for constant memory usage when performing consecutive trainings.I found the solution! Not use this piece of shit library written by fucking unprincipled dumbasses and morons. What they can do great is just do fucking plastic smiles, pass difficult useless interviews and lick ass to their bosses from the mose fascist organization of all times. Please die from cancer! Pytorch forever
https://fantashit.com/linearly-increasing-memory-with-use-multiprocessing-and-keras-sequence/#comment-254237
Can anyone advise how I check since which version this bug is implemented in a stable release? Can’t find any info on the releases page: https://github.com/tensorflow/tensorflow/releases?page=1