cudf: [BUG] Loading `libcudf.so` increases device memory usage by ~300MB

In working with rmm and cudf, if one only uses rmm we observe a lower device memory usage than if we load the libcudf shared library and only use rmm. Apologies for the Python only examples:

Example 1, not loading libcudf.so, device memory usage ~335 MiB:

import rmm

buf = rmm.DeviceBuffer(size=5)
del buf

print("Press enter to exit")
input()

Example 2, loading libcudf.so, device memory usage ~651 MiB:

import rmm
import ctypes

libcudf = ctypes.cdll.LoadLibrary("libcudf.so")

buf = rmm.DeviceBuffer(size=5)
del buf

print("Press enter to exit")
input()

Example 3, construct and destruct buffer then load libcudf.so then construct and destruct buffer again, ~335 MiB in first construct/destruct, ~651 MiB after second construct/destruct:

import rmm
import ctypes

buf = rmm.DeviceBuffer(size=5)
del buf

print("Press enter to continue")
input()

libcudf = ctypes.cdll.LoadLibrary("libcudf.so")

buf = rmm.DeviceBuffer(size=5)
del buf

print("Press enter to exit")
input()

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 32 (31 by maintainers)

Most upvoted comments

I think we are premature optimizing here. Once we remove legacy APIs and NVStrings/NVCategory, I’d expect the library size should go down significantly.

For anyone wondering why reductions are so expensive, it’s because we have to instantiate N * N * K kernels where N is the number of types we support, and K is the number of reduction operators we support.

Maybe we need to switch reductions over to using Jitify as well.

Workloads people are pushing these days are saturating available memory (and often OOMing). Every bit counts.