xgboost: xgboost.dask.train (cpu) high memory usage when increasing max_depth
We are trying to understand memory usage for xgboost.dask.train and what is the expected behavior as max_depth increases.
Running a LocalCluster in my laptop mac m1 - 16GB memory dask version = 2023.6.0 xgboost version = 1.7.6
Notes: I tried to run max_depth 20 and it killed my laptop.
Reproducer (fabricated example, to showcase problem)
import dask
import xgboost
from dask.distributed import Client
from distributed.diagnostics import MemorySampler
client = Client()
ms = MemorySampler()
dtypes = {}
for i in range(40):
dtypes["f-%d" % i] = float
for i in range(40):
dtypes["i-%d" % i] = int
df = dask.datasets.timeseries(
start="2000-01-01",
end="2001-01-01",
dtypes=dtypes,
partition_freq="4d",
freq="10s",
)
df = df.persist() # this is a 1.9GiB dataset
X = df.drop("i-1", axis=1)
y = df["i-1"]
d_train = xgboost.dask.DaskDMatrix(
None, X, y, enable_categorical=True
)
#this will give the curve corresponding to 18 but you can run it with different values and get the plot below
with ms.sample("md_18"):
model = xgboost.dask.train(
None,
{
"max_depth": 18,
},
d_train,
evals=[(d_train, "train")],
)
ms.plot(align=True, grid=True)
We noticed that if we include num_parallel_trees things are different, but it’s not clear to us why, note that the cases for 18 and 20 were taking too long to finish, so we cut them off (they were over 10 min on the 7th iteration)
model = xgboost.dask.train(
None,
{
"max_depth": 18,
"num_parallel_tree": 100,
},
d_train,
evals=[(d_train, "train")],
)
num_parallel_trees = 100
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 20 (20 by maintainers)
I’m working on it, among other things.
I don’t have an eta. We need to redesign the cache for the parallel histogram builder. Should be possible for 2.0.