cupy: Cupy function doesn't utilize pinned memory inside stream
-
Conditions CuPy Version : 7.2.0 CUDA Root : /usr/common/software/cuda/10.1.243 CUDA Build Version : 10010 CUDA Driver Version : 10020 CUDA Runtime Version : 10010 cuBLAS Version : 10202 cuFFT Version : 10102 cuRAND Version : 10102 cuSOLVER Version : (10, 3, 0) cuSPARSE Version : 10301 NVRTC Version : (10, 1) cuDNN Build Version : 7605 cuDNN Version : 7605 NCCL Build Version : 2506 NCCL Runtime Version : 2506
-
Code to reproduce
import numpy as np
import cupy as cp
import cupy.linalg
import cupyx.scipy.special
import cupyx as cpx
stream_1 = cp.cuda.stream.Stream()
with stream_1:
cp.random.seed(1)
A = cp.random.rand(10000, 10000)
u, v = cp.linalg.eigh(cpx.scipy.sparse.csr_matrix(A).todense())
- Error messages, stack traces, or logs
By profiling the above code, I observe that there are many small bursts of
cudaMemcpy2DAsyncs happening ineigh, despite never explicitly requesting cupy to transfer data back. I am putting the cupy call in a stream. How do I force cupy to use pinned memory efficiently?
eigh_profile5.qdrep.zip
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (11 by maintainers)
FYI this was opened as a bug internally in NVIDIA.
Looks like those data transfers are made outside of CuPy (likely in cuSPARSE or cuSOLVER). IIUC almost all CuPy internal kernels are prefixed with
cupy_(orcupyx_), but I don’t see any in those transfers.I am not pretty sure here, but the issue might be cuSOLVER doing data transfers? CuPy has a pinned memory pool used for its data transfers. But we can’t guarantee what happens inside CUDA libraries.
cc. @pentschev @anaruse
Reference : https://docs-cupy.chainer.org/en/stable/reference/memory.html
Thank you, we appreciate it!
As this is related to CUDA libraries more than CuPy, we will close this issue.
@jakirkham V100 with 16GB of memory A more detailed description about the configuration can be found here: https://docs-dev.nersc.gov/cgpu/hardware/
Ok good to see that performance improvement at least.
Not sure that’s needed yet.
Am talking to someone who knows this a bit better to get some more insight into what is going on here.