cupy: Cupy function doesn't utilize pinned memory inside stream

Conditions CuPy Version : 7.2.0 CUDA Root : /usr/common/software/cuda/10.1.243 CUDA Build Version : 10010 CUDA Driver Version : 10020 CUDA Runtime Version : 10010 cuBLAS Version : 10202 cuFFT Version : 10102 cuRAND Version : 10102 cuSOLVER Version : (10, 3, 0) cuSPARSE Version : 10301 NVRTC Version : (10, 1) cuDNN Build Version : 7605 cuDNN Version : 7605 NCCL Build Version : 2506 NCCL Runtime Version : 2506
Code to reproduce

import numpy as np
import cupy as cp
import cupy.linalg
import cupyx.scipy.special
import cupyx as cpx

stream_1 = cp.cuda.stream.Stream()
with stream_1:
    cp.random.seed(1)
    A = cp.random.rand(10000, 10000)
    u, v = cp.linalg.eigh(cpx.scipy.sparse.csr_matrix(A).todense())

Error messages, stack traces, or logs By profiling the above code, I observe that there are many small bursts of cudaMemcpy2DAsyncs happening in eigh, despite never explicitly requesting cupy to transfer data back. I am putting the cupy call in a stream. How do I force cupy to use pinned memory efficiently? eigh_profile5.qdrep.zip

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 16 (11 by maintainers)

Most upvoted comments

FYI this was opened as a bug internally in NVIDIA.

jakirkham on Mar 31, 2020

Looks like those data transfers are made outside of CuPy (likely in cuSPARSE or cuSOLVER). IIUC almost all CuPy internal kernels are prefixed with cupy_ (or cupyx_), but I don’t see any in those transfers.

leofang on Mar 6, 2020

I am not pretty sure here, but the issue might be cuSOLVER doing data transfers? CuPy has a pinned memory pool used for its data transfers. But we can’t guarantee what happens inside CUDA libraries.

cc. @pentschev @anaruse

Reference : https://docs-cupy.chainer.org/en/stable/reference/memory.html

emcastillo on Mar 5, 2020

Thank you, we appreciate it!

lastephey on Mar 31, 2020

As this is related to CUDA libraries more than CuPy, we will close this issue.

emcastillo on Mar 10, 2020

@jakirkham V100 with 16GB of memory A more detailed description about the configuration can be found here: https://docs-dev.nersc.gov/cgpu/hardware/

ziyaointl on Mar 6, 2020

Ok good to see that performance improvement at least.

Not sure that’s needed yet.

Am talking to someone who knows this a bit better to get some more insight into what is going on here.

jakirkham on Mar 6, 2020