tensorflow: Memory leaks in repeated model training despite garbage collection and session clearing

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes, see below
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows and Linux
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): v2.3.0-54-gfcc4b966f1 2.3.1
  • Python version: 3.8.5
  • CUDA/cuDNN version: none
  • GPU model and memory: none

Describe the current behavior The code below leaks memory.

From my local, CPU-only system:

iteration 0: rss 368 MB
iteration 1: rss 400 MB  # new maximum
iteration 2: rss 402 MB  # new maximum
iteration 3: rss 439 MB  # new maximum
iteration 4: rss 431 MB
iteration 5: rss 444 MB  # new maximum
iteration 6: rss 442 MB
iteration 7: rss 447 MB  # new maximum
iteration 8: rss 445 MB
iteration 9: rss 461 MB
iteration 10: rss 449 MB  # new maximum
iteration 11: rss 462 MB  # new maximum
iteration 12: rss 460 MB
iteration 13: rss 466 MB  # new maximum
iteration 14: rss 463 MB
iteration 15: rss 483 MB  # new maximum
iteration 16: rss 468 MB
iteration 17: rss 490 MB  # new maximum
iteration 18: rss 476 MB
iteration 19: rss 489 MB
iteration 20: rss 478 MB
iteration 21: rss 491 MB  # new maximum
iteration 22: rss 480 MB
iteration 23: rss 492 MB  # new maximum
iteration 24: rss 480 MB
iteration 25: rss 497 MB  # new maximum
iteration 26: rss 481 MB
iteration 27: rss 493 MB
iteration 28: rss 483 MB
iteration 29: rss 493 MB
iteration 30: rss 486 MB
iteration 31: rss 488 MB
iteration 32: rss 487 MB
iteration 33: rss 489 MB
iteration 34: rss 498 MB  # new maximum
iteration 35: rss 487 MB
iteration 36: rss 494 MB
iteration 37: rss 493 MB
iteration 38: rss 489 MB
iteration 39: rss 500 MB  # new maximum
iteration 40: rss 487 MB
iteration 41: rss 501 MB  # new maximum
iteration 42: rss 494 MB
iteration 43: rss 493 MB
iteration 44: rss 502 MB  # new maximum
iteration 45: rss 510 MB  # new maximum
iteration 46: rss 503 MB
iteration 47: rss 497 MB
iteration 48: rss 494 MB
iteration 49: rss 507 MB
iteration 50: rss 497 MB
iteration 51: rss 507 MB
iteration 52: rss 499 MB
iteration 53: rss 499 MB
iteration 54: rss 507 MB
iteration 55: rss 497 MB
iteration 56: rss 507 MB
iteration 57: rss 500 MB
iteration 58: rss 501 MB
iteration 59: rss 508 MB
iteration 60: rss 498 MB
iteration 61: rss 507 MB
iteration 62: rss 501 MB
iteration 63: rss 502 MB
iteration 64: rss 510 MB
iteration 65: rss 501 MB
iteration 66: rss 511 MB  # new maximum
iteration 67: rss 503 MB
iteration 68: rss 502 MB
iteration 69: rss 512 MB
iteration 70: rss 500 MB
iteration 71: rss 512 MB  # new maximum
iteration 72: rss 505 MB
iteration 73: rss 506 MB
iteration 74: rss 512 MB
iteration 75: rss 522 MB  # new maximum
iteration 76: rss 509 MB
iteration 77: rss 507 MB
iteration 78: rss 507 MB
iteration 79: rss 514 MB
iteration 80: rss 525 MB  # new maximum
iteration 81: rss 514 MB
iteration 82: rss 508 MB
iteration 83: rss 507 MB
iteration 84: rss 516 MB
iteration 85: rss 527 MB  # new maximum
iteration 86: rss 513 MB
iteration 87: rss 511 MB
iteration 88: rss 509 MB
iteration 89: rss 516 MB
iteration 90: rss 508 MB
iteration 91: rss 511 MB
iteration 92: rss 514 MB
iteration 93: rss 509 MB
iteration 94: rss 518 MB
iteration 95: rss 508 MB
iteration 96: rss 513 MB
iteration 97: rss 514 MB
iteration 98: rss 509 MB
iteration 99: rss 519 MB

From Colab:

iteration 0: rss 547 MB
iteration 1: rss 652 MB  # new maximum
iteration 2: rss 671 MB  # new maximum
iteration 3: rss 674 MB  # new maximum
iteration 4: rss 674 MB
iteration 5: rss 674 MB
iteration 6: rss 674 MB
iteration 7: rss 675 MB  # new maximum
iteration 8: rss 680 MB  # new maximum
iteration 9: rss 680 MB
iteration 10: rss 689 MB  # new maximum
iteration 11: rss 689 MB
iteration 12: rss 689 MB
iteration 13: rss 689 MB
iteration 13: rss 689 MB
iteration 14: rss 689 MB
iteration 15: rss 689 MB
iteration 16: rss 689 MB
iteration 17: rss 689 MB
iteration 18: rss 689 MB
iteration 19: rss 689 MB
iteration 20: rss 689 MB
iteration 21: rss 689 MB
iteration 22: rss 689 MB
iteration 23: rss 689 MB
iteration 24: rss 689 MB
iteration 25: rss 689 MB
iteration 26: rss 689 MB
iteration 27: rss 689 MB
iteration 28: rss 689 MB
iteration 29: rss 689 MB
iteration 30: rss 689 MB
iteration 31: rss 689 MB
iteration 32: rss 689 MB
iteration 33: rss 689 MB
iteration 34: rss 689 MB
iteration 35: rss 691 MB  # new maximum
iteration 36: rss 691 MB
iteration 37: rss 697 MB  # new maximum
iteration 38: rss 702 MB  # new maximum
iteration 39: rss 704 MB  # new maximum
iteration 40: rss 704 MB
iteration 41: rss 704 MB
iteration 42: rss 704 MB
iteration 43: rss 704 MB
iteration 44: rss 704 MB
iteration 45: rss 704 MB
iteration 46: rss 704 MB
iteration 47: rss 704 MB
iteration 48: rss 704 MB
iteration 49: rss 704 MB
iteration 50: rss 704 MB
iteration 51: rss 704 MB
iteration 52: rss 704 MB
iteration 53: rss 704 MB
iteration 54: rss 704 MB
iteration 55: rss 704 MB
iteration 56: rss 704 MB
iteration 57: rss 704 MB
iteration 58: rss 704 MB
iteration 59: rss 704 MB
iteration 60: rss 704 MB
iteration 61: rss 704 MB
iteration 62: rss 704 MB
iteration 63: rss 704 MB
iteration 64: rss 704 MB
iteration 65: rss 704 MB
iteration 66: rss 704 MB
iteration 67: rss 704 MB
iteration 68: rss 704 MB
iteration 69: rss 704 MB
iteration 70: rss 704 MB
iteration 71: rss 704 MB
iteration 72: rss 704 MB
iteration 73: rss 704 MB
iteration 74: rss 704 MB
iteration 75: rss 704 MB
iteration 76: rss 705 MB  # new maximum
iteration 77: rss 705 MB
iteration 78: rss 705 MB
iteration 79: rss 705 MB
iteration 80: rss 705 MB
iteration 81: rss 705 MB
iteration 82: rss 706 MB  # new maximum
iteration 83: rss 706 MB
iteration 84: rss 713 MB  # new maximum
iteration 85: rss 713 MB
iteration 86: rss 713 MB
iteration 87: rss 713 MB
iteration 88: rss 719 MB  # new maximum
iteration 89: rss 719 MB
iteration 90: rss 719 MB
iteration 91: rss 719 MB
iteration 92: rss 719 MB
iteration 93: rss 719 MB
iteration 94: rss 719 MB
iteration 95: rss 719 MB
iteration 96: rss 719 MB
iteration 97: rss 720 MB  # new maximum
iteration 98: rss 720 MB
iteration 99: rss 720 MB

Describe the expected behavior Memory consumptions stays constant

Standalone code to reproduce the issue

import gc
import os

import numpy as np
import psutil
import tensorflow as tf

tf.get_logger().setLevel("ERROR")  # Suppress "tf.function retracing" warnings
process = psutil.Process(os.getpid())
for i in range(100):
    # do some work
    model = tf.keras.applications.mobilenet.MobileNet()
    model.compile(loss="mse")
    x = tf.zeros((1, *model.input.shape[1:]))
    y = tf.zeros((1, *model.output.shape[1:]))
    history = model.fit(x=x, y=y, verbose=0)
    
    # clean up
    _ = gc.collect()
    tf.keras.backend.clear_session()
    
    # show memory usage
    print(f"iteration {i}: rss {process.memory_info().rss >> 20} MB")

Other info / logs

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 23 (8 by maintainers)

Most upvoted comments

Are you using gc after clear_session?:

 tf.keras.backend.clear_session()
 gc.collect()