cudf: [BUG] Long import times

Describe the bug

It takes a few seconds to import cudf today

Steps/Code to reproduce bug

In [1]: import numpy, pandas

In [2]: %time import cudf
CPU times: user 432 ms, sys: 132 ms, total: 564 ms
Wall time: 2.44 s
**git***
commit ab3f45857f641548f6d64d977908075d63c193bf (HEAD -> repr-html, mrocklin/repr-html)
Author: Matthew Rocklin <mrocklin@gmail.com>
Date:   Thu Jan 3 17:35:10 2019 -0800

    Test repr for both large and small dataframes

***OS Information***
DGX_NAME="DGX Server"
DGX_PRETTY_NAME="NVIDIA DGX Server"
DGX_SWBUILD_DATE="2018-03-20"
DGX_SWBUILD_VERSION="3.1.6"
DGX_COMMIT_ID="1b0f58ecbf989820ce745a9e4836e1de5eea6cfd"
DGX_SERIAL_NUMBER=QTFCOU8310024

DGX_OTA_VERSION="3.1.7"
DGX_OTA_DATE="Thu Sep 27 20:07:53 PDT 2018"
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Linux dgx16 4.4.0-135-generic #161-Ubuntu SMP Mon Aug 27 10:45:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

***GPU Information***
Thu Jan  3 18:42:10 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
| N/A   42C    P0    59W / 300W |   1085MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   43C    P0    46W / 300W |     11MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
| N/A   42C    P0    46W / 300W |     11MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
| N/A   41C    P0    47W / 300W |     11MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
| N/A   42C    P0    46W / 300W |     11MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
| N/A   42C    P0    44W / 300W |     11MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
| N/A   44C    P0    45W / 300W |     11MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
| N/A   41C    P0    44W / 300W |     11MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     64394      C   ...klin/miniconda/envs/cudf_dev/bin/python   644MiB |
|    0     65545      C   ...klin/miniconda/envs/cudf_dev/bin/python   430MiB |
+-----------------------------------------------------------------------------+

***CPU***
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0-79
Thread(s) per core:    2
Core(s) per socket:    20
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               2030.789
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              4391.76
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              51200K
NUMA node0 CPU(s):     0-19,40-59
NUMA node1 CPU(s):     20-39,60-79
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts flush_l1d

***CMake***
/home/nfs/mrocklin/miniconda/envs/cudf_dev/bin/cmake
cmake version 3.13.2

CMake suite maintained and supported by Kitware (kitware.com/cmake).

***g++***
/usr/bin/g++
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


***nvcc***

***Python***
/home/nfs/mrocklin/miniconda/envs/cudf_dev/bin/python
Python 3.5.5

***Environment Variables***
PATH                            : /home/nfs/mrocklin/miniconda/envs/cudf_dev/bin:/home/nfs/mrocklin/miniconda/bin:/home/nfs/mrocklin/miniconda/bin:/home/nfs/mrocklin/miniconda/bin:/home/nfs/mrocklin/bin:/home/nfs/mrocklin/.local/bin:/home/nfs/mrocklin/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
LD_LIBRARY_PATH                 :
NUMBAPRO_NVVM                   :
NUMBAPRO_LIBDEVICE              :
CONDA_PREFIX                    : /home/nfs/mrocklin/miniconda/envs/cudf_dev
PYTHON_PATH                     :

***conda packages***
/home/nfs/mrocklin/miniconda/bin/conda
# packages in environment at /home/nfs/mrocklin/miniconda/envs/cudf_dev:
#
# Name                    Version                   Build  Channel
alabaster                 0.7.12                     py_0    conda-forge
arrow-cpp                 0.10.0           py35h70250a7_0    conda-forge
asn1crypto                0.24.0                   py35_3    conda-forge
atomicwrites              1.2.1                      py_0    conda-forge
attrs                     18.2.0                     py_0    conda-forge
babel                     2.6.0                      py_1    conda-forge
backcall                  0.1.0                      py_0    conda-forge
blas                      1.0                         mkl
bleach                    3.0.2                      py_1    conda-forge
bokeh                     0.13.0                   py35_0
boost                     1.67.0           py35h3e44d54_0    conda-forge
boost-cpp                 1.67.0               h3a22d5f_0    conda-forge
bzip2                     1.0.6                h470a237_2    conda-forge
ca-certificates           2018.03.07                    0
certifi                   2018.8.24                py35_1
cffi                      1.11.5           py35h5e8e0c9_1    conda-forge
chardet                   3.0.4                    py35_3    conda-forge
click                     7.0                        py_0    conda-forge
cloudpickle               0.6.1                      py_0    conda-forge
cmake                     3.13.2               h011004d_0    conda-forge
commonmark                0.5.4                      py_2    conda-forge
cryptography              2.3.1            py35hdffb7b8_0    conda-forge
cryptography-vectors      2.3.1                    py35_0    conda-forge
cudatoolkit               9.2                           0
cudf                      0.4.0+385.g81a187d           <pip>
curl                      7.63.0               h74213dd_0    conda-forge
cython                    0.28.5           py35hfc679d8_0    conda-forge
cytoolz                   0.9.0.1          py35h470a237_0    conda-forge
dask                      1.0.0+51.g2ca205b           <pip>
dask-core                 1.0.0                      py_0    conda-forge
dask-cuda                 0.0.0                     <pip>
dask-cudf                 0.0.1+222.g10a9f90.dirty           <pip>
decorator                 4.3.0                      py_0    conda-forge
distributed               1.23.2                   py35_1    conda-forge
distributed               1.25.1+10.ga0d0ed2           <pip>
docutils                  0.14                     py35_1    conda-forge
entrypoints               0.2.3                    py35_2    conda-forge
expat                     2.2.5                hfc679d8_2    conda-forge
future                    0.16.0                   py35_2    conda-forge
gmp                       6.1.2                hfc679d8_0    conda-forge
heapdict                  1.0.0                 py35_1000    conda-forge
icu                       58.2                 hfc679d8_0    conda-forge
idna                      2.7                      py35_2    conda-forge
imagesize                 1.1.0                      py_0    conda-forge
intel-openmp              2019.1                      144
ipykernel                 5.1.0              pyh24bf2e0_0    conda-forge
ipython                   7.0.1            py35h24bf2e0_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
jedi                      0.12.1                   py35_0    conda-forge
jinja2                    2.10                       py_1    conda-forge
jsonschema                2.6.0                    py35_2    conda-forge
jupyter_client            5.2.4                      py_0    conda-forge
jupyter_core              4.4.0                      py_0    conda-forge
krb5                      1.16.2               hbb41f41_0    conda-forge
libcurl                   7.63.0               hbdb9355_0    conda-forge
libedit                   3.1.20170329         haf1bffa_1    conda-forge
libffi                    3.2.1                hfc679d8_5    conda-forge
libgcc-ng                 8.2.0                hdf63c60_1
libgdf-cffi               0.4.0                     <pip>
libgfortran-ng            7.2.0                hdf63c60_3    conda-forge
librmm-cffi               0.4.0                     <pip>
libsodium                 1.0.16               h470a237_1    conda-forge
libssh2                   1.8.0                h5b517e9_3    conda-forge
libstdcxx-ng              8.2.0                hdf63c60_1
libuv                     1.24.1               h470a237_0    conda-forge
llvmlite                  0.27.0           py35hf484d3e_0    numba
Markdown                  2.6.11                    <pip>
markupsafe                1.0              py35h470a237_1    conda-forge
mistune                   0.8.3            py35h470a237_2    conda-forge
mkl                       2018.0.3                      1
mkl_fft                   1.0.9                    py35_0    conda-forge
mkl_random                1.0.1                    py35_0    conda-forge
more-itertools            4.3.0                    py35_0    conda-forge
msgpack-python            0.5.6            py35h2d50403_3    conda-forge
nbconvert                 5.3.1                      py_1    conda-forge
nbformat                  4.4.0                      py_1    conda-forge
ncurses                   6.1                  hfc679d8_2    conda-forge
notebook                  5.7.0                    py35_0    conda-forge
numba                     0.42.0          np115py35hf484d3e_0    numba
numpy                     1.15.2           py35h1d66e8a_0
numpy-base                1.15.2           py35h81de0dd_0
numpydoc                  0.8.0                      py_1    conda-forge
nvstrings                 0.1.0            cuda9.2_py35_0    nvidia
openssl                   1.0.2p               h14c3975_0
packaging                 18.0                       py_0    conda-forge
pandas                    0.20.3                   py35_1    conda-forge
pandoc                    2.5                           0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
parquet-cpp               1.5.0.pre            h83d4a3d_0    conda-forge
parso                     0.3.1                      py_0    conda-forge
pathlib2                  2.3.2                    py35_0    conda-forge
pexpect                   4.6.0                    py35_0    conda-forge
pickleshare               0.7.5                    py35_0    conda-forge
pip                       18.0                  py35_1001    conda-forge
pluggy                    0.8.0                      py_0    conda-forge
prometheus_client         0.5.0                      py_0    conda-forge
prompt_toolkit            2.0.7                      py_0    conda-forge
psutil                    5.4.7            py35h470a237_1    conda-forge
ptyprocess                0.6.0                 py35_1000    conda-forge
py                        1.7.0                      py_0    conda-forge
pyarrow                   0.10.0           py35hfc679d8_0    conda-forge
pycparser                 2.19                       py_0    conda-forge
pygments                  2.3.1                      py_0    conda-forge
pyopenssl                 18.0.0                   py35_0    conda-forge
pyparsing                 2.3.0                      py_0    conda-forge
pysocks                   1.6.8                    py35_2    conda-forge
pytest                    4.0.2                     <pip>
pytest                    3.8.1                    py35_0
python                    3.5.5                h5001a0f_2    conda-forge
python-dateutil           2.7.5                      py_0    conda-forge
pytz                      2018.7                     py_0    conda-forge
pyyaml                    3.13             py35h470a237_1    conda-forge
pyzmq                     17.1.2           py35hae99301_0    conda-forge
readline                  7.0                  haf1bffa_1    conda-forge
recommonmark              0.4.0                      py_2    conda-forge
requests                  2.19.1                   py35_1    conda-forge
rhash                     1.3.6                h470a237_1    conda-forge
send2trash                1.5.0                      py_0    conda-forge
setuptools                40.4.3                   py35_0    conda-forge
simplegeneric             0.8.1                      py_1    conda-forge
six                       1.11.0                   py35_1    conda-forge
snowballstemmer           1.2.1                      py_1    conda-forge
sortedcontainers          2.1.0                      py_0    conda-forge
sphinx                    1.8.1                    py35_0    conda-forge
sphinx-markdown-tables    0.0.9                     <pip>
sphinx_rtd_theme          0.4.2                      py_0    conda-forge
sphinxcontrib-websupport  1.1.0                      py_1    conda-forge
sqlite                    3.26.0               hb1c47c0_0    conda-forge
tblib                     1.3.2                      py_1    conda-forge
terminado                 0.8.1                    py35_1    conda-forge
testpath                  0.3.1                    py35_1    conda-forge
tk                        8.6.9                ha92aebf_0    conda-forge
toolz                     0.9.0                      py_1    conda-forge
tornado                   5.1.1            py35h470a237_0    conda-forge
traitlets                 4.3.2                    py35_0    conda-forge
urllib3                   1.23                     py35_1    conda-forge
wcwidth                   0.1.7                      py_1    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.32.0                py35_1000    conda-forge
xz                        5.2.4                h470a237_1    conda-forge
yaml                      0.1.7                h470a237_1    conda-forge
zeromq                    4.2.5                hfc679d8_6    conda-forge
zict                      0.1.3                      py_0    conda-forge
zlib                      1.2.11               h470a237_3    conda-forge

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 43 (28 by maintainers)

Most upvoted comments

Profiled this a bit today and on my end it seems like the issue is inside cudf._cuda.gpu.getDeviceCount().

I doubt that problem is with that function. My guess is that’s just the first function that invokes a CUDA runtime API which initializes the context and other first time setup stuff.

Hmm - nothing sticks out to me in the implementation: https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/ccudart.pyx#L1458-L1466

I suspect the overhead we’re seeing here is from the initialization of the CUDA context.

I suspect the addition of CUDA Lazy Loading should dramatically improve this situation. This was added in 11.7 and has been improved in subsequent releases.

This is currently an optional feature and can be enabled by specifying the environment variable CUDA_MODULE_LOADING=LAZY at runtime.