cuml: [BUG] RMM-only context destroyed error with Random Forest in loop

It seems we may have an RMM-only memory leak with RandomForestRegressor. This could come up in a wide range of workloads, such as using RandomForestRegressor with RMM during hyper-parameter optimization.

In the following example:

Without an RMM pool, repeatedly fitting the model, predicting, and deleting the model/predictions causes peak memory of 1.2GB
With an RMM pool, repeatedly fitting the model, predicting, and deleting the model/predictions causes memory to grow uncontrollably. This can be triggered by uncommenting the rmm related lines. After 15-17 iterations, we exhaust the entire 5 GB pool.

Is it possible there is a place where RMM isn’t getting visibility of a call to free memory?

import cudf
import cuml
import rmm
import cupy as cp
from dask.utils import parse_bytes
from sklearn.datasets import make_regression

# cudf.set_allocator(pool=True, initial_pool_size=parse_bytes("5GB"))
# cp.cuda.set_allocator(rmm.rmm_cupy_allocator)

NFEATURES = 20

X, y = make_regression(
    n_samples=10000,
    n_features=NFEATURES,
    random_state=12,
)

X = X.astype("float32")
X = cp.asarray(X)
y = cp.asarray(y)

for i in range(30):
    print(i)
    clf = cuml.ensemble.RandomForestRegressor(n_estimators=50)
    clf.fit(X, y)
    preds = clf.predict(X)
    del clf, preds

Environment: 2020-07-31 nightly at ~ 9AM EDT

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 40 (40 by maintainers)

Commits related to this issue

Patch for nightly test&bench (#4840) - Fix for MNMG TSVD (similar issue to [cudaErrorContextIsDestroyed in RandomForest](https://github.com/rapidsai/cuml/issues/2632#issuecomment-675753377)) - #4826... — committed to rapidsai/cuml by viclafargue 2 years ago
Patch for nightly test&bench (#4840) - Fix for MNMG TSVD (similar issue to [cudaErrorContextIsDestroyed in RandomForest](https://github.com/rapidsai/cuml/issues/2632#issuecomment-675753377)) - #4826... — committed to jakirkham/cuml by viclafargue 2 years ago

Most upvoted comments

Fixed via PR 510 to the RMM repo.

JohnZed on Aug 20, 2020

Thanks for the repro @JohnZed. I was able to simplify it even further. This repro will actually segfault.

TEST(PoolTest, TwoStreams)
{
  Pool mr{rmm::mr::get_current_device_resource(), 0};
  cudaStream_t stream;
  const int size = 10000;
  cudaStreamCreate(&stream);
  EXPECT_NO_THROW(rmm::device_buffer buff(size, stream, &mr));
  cudaStreamDestroy(stream);
  mr.allocate(size);
}

As you identified, when we try and reclaim a block from another stream, we attempt to synchronize a stream that was already destroyed. Unfortunately this isn’t guaranteed to return a cudaErrorInvalidResourceHandle and can actually segfault.

jrhemstad on Aug 18, 2020

passing == works perfectly fine no memory leak I can see

dantegd on Aug 18, 2020

With two streams, I’m not seeing failures consistently at the same place. Sometimes I get to the 2nd iteration, sometimes the 3rd iteration before the failure.

EDIT: You’re faster 😃

beckernick on Aug 18, 2020

Thanks @harrism . I do see allocs without frees, but only if I include the model.predict. Regardless of whether I just use fit or use both fit and predict, the context appears to be destroyed. It also appears to only occur when inside a Python loop, as Saloni noted above.

So far, I’ve tested KNN (Reg/Clf), Random Forest (Reg/Clf) and Logistic Regression with the following script. Only Random Forest appears to have this issue.

# to run: python rmm-model-logger.py rfr-logs.txt
import sys

import cudf
import cuml
import rmm
import numpy as np


logfilename = sys.argv[1]

# swap estimator class here
clf = cuml.ensemble.RandomForestClassifier

rmm.reinitialize(
    pool_allocator=True,
    managed_memory=False,
    initial_pool_size=2e9,
    logging=True,
    devices=0,
    log_file_name=logfilename,
)

X = cudf.DataFrame({"a": range(10), "b": range(10,20)}).astype("float32")
y = cudf.Series(np.random.choice([0, 1], 10))

for i in range(30):
    print(i)
    model = clf()
    model.fit(X, y)
    preds = model.predict(X)

Logs:

import pandas as pd

df = pd.read_csv("rfc-logs.dev0.txt")
print(df.Action.value_counts())

allocate    211
free        201
Name: Action, dtype: int64

df = pd.read_csv("rfr-logs.dev0.txt")
print(df.Action.value_counts())
allocate    204
free        189
Name: Action, dtype: int64

rfr-logs.dev0.txt rfc-logs.dev0.txt

Interestingly, if I run the script but comment out the preds = model.predict(X) line, I still get the destroyed context but the allocs match the frees.

import pandas as pd

df = pd.read_csv("rfc-fit-only-logs.dev0.txt")
print(df.Action.value_counts())

import pandas as pd

df = pd.read_csv("rfr-fit-only-logs.dev0.txt")
print(df.Action.value_counts())
free        185
allocate    185
Name: Action, dtype: int64
free        206
allocate    206
Name: Action, dtype: int64

rfc-fit-only-logs.dev0.txt rfr-fit-only-logs.dev0.txt

Full traceback:

python rmm-model-logger.py rfr-fit-only-logs.txt
0
1
Traceback (most recent call last):
  File "rmm-model-logger.py", line 30, in <module>
    model.fit(X, y)
  File "cuml/ensemble/randomforestregressor.pyx", line 393, in cuml.ensemble.randomforestregressor.RandomForestRegressor.fit
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/site-packages/cuml/common/memory_utils.py", line 56, in cupy_rmm_wrapper
    return func(*args, **kwargs)
  File "cuml/ensemble/randomforest_common.pyx", line 251, in cuml.ensemble.randomforest_common.BaseRandomForestModel._dataset_setup_for_fit
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/site-packages/cuml/common/memory_utils.py", line 56, in cupy_rmm_wrapper
    return func(*args, **kwargs)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/site-packages/cuml/common/input_utils.py", line 188, in input_to_cuml_array
    X = convert_dtype(X, to_dtype=convert_to_dtype)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/site-packages/cuml/common/memory_utils.py", line 56, in cupy_rmm_wrapper
    return func(*args, **kwargs)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/site-packages/cuml/common/input_utils.py", line 459, in convert_dtype
    would_lose_info = _typecast_will_lose_information(X, to_dtype)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/site-packages/cuml/common/input_utils.py", line 504, in _typecast_will_lose_information
    (X < target_dtype_range.min) |
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/site-packages/cudf/core/series.py", line 1537, in __lt__
    return self._binaryop(other, "lt")
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/site-packages/cudf/core/series.py", line 1083, in _binaryop
    outcol = lhs._column.binary_operator(fn, rhs, reflect=reflect)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/site-packages/cudf/core/column/numerical.py", line 100, in binary_operator
    lhs=self, rhs=rhs, op=binop, out_dtype=out_dtype, reflect=reflect
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817/lib/python3.7/site-packages/cudf/core/column/numerical.py", line 472, in _numeric_column_binop
    out = libcudf.binaryop.binaryop(lhs, rhs, op, out_dtype)
  File "cudf/_lib/binaryop.pyx", line 200, in cudf._lib.binaryop.binaryop
  File "cudf/_lib/scalar.pyx", line 361, in cudf._lib.scalar.as_scalar
  File "cudf/_lib/scalar.pyx", line 81, in cudf._lib.scalar.Scalar.__init__
  File "cudf/_lib/scalar.pyx", line 174, in cudf._lib.scalar._set_numeric_from_np_scalar
RuntimeError: CUDA error at: ../include/rmm/mr/device/detail/stream_ordered_memory_resource.hpp365: cudaErrorContextIsDestroyed context is destroyed

Environment:

conda list | grep "rmm\|cudf\|cuml\|numba\|cupy\|rapids"
# packages in environment at /raid/nicholasb/miniconda3/envs/rapids-tpcxbb-20200817:
cudf                      0.15.0a200817   py37_g1778921b0_4666    rapidsai-nightly
cuml                      0.15.0a200817   cuda10.2_py37_g1e5b7d348_1979    rapidsai-nightly
cupy                      7.7.0            py37h940342b_0    conda-forge
dask-cuda                 0.15.0a200817          py37_117    rapidsai-nightly
dask-cudf                 0.15.0a200817   py37_g1778921b0_4666    rapidsai-nightly
faiss-proc                1.0.0                      cuda    rapidsai-nightly
libcudf                   0.15.0a200817   cuda10.2_g1778921b0_4666    rapidsai-nightly
libcuml                   0.15.0a200817   cuda10.2_g1e5b7d348_1979    rapidsai-nightly
libcumlprims              0.15.0a200812       cuda10.2_61    rapidsai-nightly
librmm                    0.15.0a200817   cuda10.2_g17efc89_665    rapidsai-nightly
numba                     0.50.1           py37h0da4684_1    conda-forge
rmm                       0.15.0a200817   py37_g17efc89_665    rapidsai-nightly
ucx                       1.8.1+g6b29558       cuda10.2_0    rapidsai-nightly
ucx-proc                  1.0.0                       gpu    rapidsai-nightly
ucx-py                    0.15.0a200817+g6b29558        py37_203    rapidsai-nightly

cc @jakirkham @Salonijain27

beckernick on Aug 17, 2020

Can you guys turn on logging and share the logs?

You’ll want to look for allocs without frees in the logs.

harrism on Aug 14, 2020