catboost: Potential memory leak in catboost.Pool

Problem: Memory leak in long running applications using catboost catboost version: 0.26.1 Operating System: MacOS Catalina CPU: 2.6 GHz 6-Core Intel Core i7

The issue of Pool object not releasing memory was already discussed in https://github.com/catboost/catboost/issues/892 and explained here https://github.com/catboost/catboost/issues/892#issuecomment-583037773.

We are using a slightly modified example from the comment above:

import gc
import os
import sys

import catboost as cb
import numpy as np
import psutil


def memory_footprint():
    """Returns memory (in MB) being used by Python process"""
    mem = psutil.Process(os.getpid()).memory_info().rss
    return mem / 1024 ** 2


def main(batch_size=15, n_iterations=100, print_every=10, cleanup_every=None):
    print("python version=", sys.version)
    print("numpy version=", np.__version__)
    print("catboost version=", cb.__version__)

    features = [[1, 1, 0, 0.5, 0.33]] * batch_size
    cat_indices = [0, 1, 2]

    for i in range(n_iterations):

        if i % print_every == 0:
            print("Memory usage (iter {}): {:.2f} MB".format(i, memory_footprint()))

        features_pool = cb.Pool(features, cat_features=cat_indices)

        if cleanup_every and (i % cleanup_every == 0):
            del features_pool
            gc.collect()

When running for small number of iterations with large batch size there seems to be no issue (memory seems to reach a plateau after several iterations):

main(batch_size=1_500_000, n_iterations=15, print_every=1, cleanup_every=1)
python version= 3.8.7 (default, Mar  4 2021, 17:04:03) 
[Clang 12.0.0 (clang-1200.0.32.29)]
numpy version= 1.21.2
catboost version= 0.26.1
Memory usage (iter 0): 87.73 MB
Memory usage (iter 1): 120.77 MB
Memory usage (iter 2): 120.80 MB
Memory usage (iter 3): 120.72 MB
Memory usage (iter 4): 120.75 MB
Memory usage (iter 5): 120.79 MB
Memory usage (iter 6): 120.82 MB
Memory usage (iter 7): 120.82 MB
Memory usage (iter 8): 120.85 MB
Memory usage (iter 9): 120.85 MB
Memory usage (iter 10): 120.85 MB
Memory usage (iter 11): 120.85 MB
Memory usage (iter 12): 120.85 MB
Memory usage (iter 13): 120.85 MB
Memory usage (iter 14): 120.85 MB

However, if we reduce the batch size and let it run for some time, memory keeps increasing after 1.4 million iterations:

main(batch_size=15, n_iterations=1_500_000, print_every=50000, cleanup_every=1000)
python version= 3.8.7 (default, Mar  4 2021, 17:04:03) 
[Clang 12.0.0 (clang-1200.0.32.29)]
numpy version= 1.21.2
catboost version= 0.26.1
Memory usage (iter 0): 76.31 MB
Memory usage (iter 50000): 86.60 MB
Memory usage (iter 100000): 95.76 MB
Memory usage (iter 150000): 106.44 MB
Memory usage (iter 200000): 114.22 MB
Memory usage (iter 250000): 127.86 MB
Memory usage (iter 300000): 135.63 MB
Memory usage (iter 350000): 143.37 MB
Memory usage (iter 400000): 163.04 MB
Memory usage (iter 450000): 170.80 MB
Memory usage (iter 500000): 178.55 MB
Memory usage (iter 550000): 186.30 MB
Memory usage (iter 600000): 194.05 MB
Memory usage (iter 650000): 201.80 MB
Memory usage (iter 700000): 209.55 MB
Memory usage (iter 750000): 217.31 MB
Memory usage (iter 800000): 249.01 MB
Memory usage (iter 850000): 256.75 MB
Memory usage (iter 900000): 201.80 MB
Memory usage (iter 950000): 209.22 MB
Memory usage (iter 1000000): 217.59 MB
Memory usage (iter 1050000): 224.99 MB
Memory usage (iter 1100000): 233.52 MB
Memory usage (iter 1150000): 242.02 MB
Memory usage (iter 1200000): 250.55 MB
Memory usage (iter 1250000): 259.06 MB
Memory usage (iter 1300000): 267.56 MB
Memory usage (iter 1350000): 276.07 MB
Memory usage (iter 1400000): 284.59 MB
Memory usage (iter 1450000): 293.11 MB

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 16 (6 by maintainers)

Commits related to this issue

Most upvoted comments

Yes, we will release CatBoost with the first fix soon. However, we will keep tcmalloc because it is slightly faster than the default allocator.

@zquintana it was fixed in 1.0 catboost release (specifically here), we didn’t have any issues since then