tensorflow: Memory leak with tf.shuffle, doesn't release buffer memory

System information

  • OS Platform and Distribution Linux Ubuntu 16.04
  • Python: 2.7.17 |Anaconda, Inc.| (default, Oct 21 2019, 19:04:46) [GCC 7.3.0]
  • Tensorflow: 1.12.0
  • Numpy: 1.16.5
  • GPU: GeForce RTX 2080 Ti
  • CUDA: 9.2

Describe the current behavior CPU memory gradually increase after each epoch until the program restarts, i suspect that dataset.shuffle doesn’t release the buffer memory. Tested with tf 1.15, same situation.

Code to reproduce the issue

import numpy as np
import tensorflow as tf
class ASRDataGenerator(object):
    def __init__(self,num):
        self.num = num
    def __call__(self):
        for i in range(self.num):
            for j in range(106):
                yield 'a','b',np.random.randn(100,120)

class TFASRDataSet(object):
    def __init__(self,num,batch_size):

        self.num = num
        self.batch_size = batch_size
        self.asrDataGenerator = ASRDataGenerator(num)
        
    def setDataSetIterator(self):
        
        dataset = tf.data.Dataset.from_generator(self.asrDataGenerator, (tf.string,tf.string,tf.float32))
        dataset = dataset.shuffle(30000)
        dataset = dataset.map(lambda s1,s2,feat: [s1,s2,feat])
        dataset = dataset.batch(self.batch_size, drop_remainder=True)
        self.iterator = dataset.make_initializable_iterator()
        
test_tfASRDataSet = TFASRDataSet(248,192)
test_tfASRDataSet.setDataSetIterator()
test_iter = test_tfASRDataSet.iterator
test_next = test_iter.get_next()   

run_config = tf.ConfigProto()
run_config.gpu_options.allow_growth = True
run_config.allow_soft_placement = True

with tf.Session(config=run_config) as sess:

    for i in range(100):

        sess.run(test_iter.initializer)
        
        while True:
            try:
                loss_list = sess.run([test_next])
                print(len(loss_list[0]))
            except tf.errors.OutOfRangeError:
                print("train epoch %d finish" % (i+1))
                break

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 3
  • Comments: 18 (9 by maintainers)

Most upvoted comments

I have recently investigated the memory growth observed for OSS version of TensorFlow when shuffle is used. The conclusion of my investigation is that the memory growth is because of poor performance of the memory allocator (TensorFlow OSS uses system malloc by default). In my experiments, switching to use TCMalloc (details below) resulted in constant memory usage (and program speedup).

For the evaluation, I used the following simple input pipeline:

import tensorflow as tf
import psutil

dataset = tf.Dataset.range(int(1e7))
iterator = dataset.shuffle(int(1e7)).batch(int(1e6))

for _ in iterator:
  used_mem = psutil.virtual_memory().used
  print("used memory: {} Mb".format(used_mem / 1024 / 1024))

When executed on workstation, it produces the following output:

$ python example.py

used memory: 19853.52734375 Mb
used memory: 19905.6484375 Mb
used memory: 19958.109375 Mb
used memory: 20014.796875 Mb
used memory: 20064.8359375 Mb
used memory: 20061.375 Mb
used memory: 20117.23828125 Mb
used memory: 20172.8515625 Mb
used memory: 20228.18359375 Mb
used memory: 20278.62890625 Mb

I then installed tcmalloc using sudo apt-get install libtcmalloc-minimal4 and used it for the same program, as follows:

$ LD_PRELOAD=/path/to/libtcmalloc_minimal.so.4 python example.py

used memory: 19291.0859375 Mb
used memory: 19307.90234375 Mb
used memory: 19315.859375 Mb
used memory: 19315.859375 Mb
used memory: 19315.875 Mb
used memory: 19317.8671875 Mb
used memory: 19311.14453125 Mb
used memory: 19317.3515625 Mb
used memory: 19317.34765625 Mb
used memory: 19316.96484375 Mb

Not only the gradual memory growth disappeared, but the program also ran 2x faster.

@azzeddineCH I am on 2.7 and still have the problem.