cuml: FAISS memory error with UMAP (WSL2)

Describe the bug

I have a dataset of 800k items (768 dim vectors). UMAP will work with the full 800k dataset, and with smaller (randomly sampled) datasets of around 150k, but medium-sized datasets of size ~300k, 350k etc crash with this error.

Traceback (most recent call last):
  File "/opt/project/callbacks.py", line 744, in on_click
    umap_data_3D = umap_for_clustering.fit_transform(embedding_matrix)
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/internals/api_decorators.py", line 549, in inner_set_get
    ret_val = func(*args, **kwargs)
  File "cuml/manifold/umap.pyx", line 659, in cuml.manifold.umap.UMAP.fit_transform
    
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/internals/api_decorators.py", line 409, in inner_with_setters
    return func(*args, **kwargs)
  File "cuml/manifold/umap.pyx", line 600, in cuml.manifold.umap.UMAP.fit
    
RuntimeError: Error in virtual void faiss::gpu::StandardGpuResourcesImpl::initializeForDevice(int) at /home/conda/feedstock_root/build_artifacts/faiss-split_1618468126454/work/faiss/gpu/StandardGpuResources.cpp:285: Error: 'err == cudaSuccess' failed: failed to cudaHostAlloc 268435456 bytes for CPU <-> GPU async copy buffer (error 2 out of memory)

I’m using a Titan RTX GPU with 24GB memory and nvidia-smi is showing more than enough free memory for this operation before applying fit_transform:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.10       Driver Version: 510.10       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA TITAN RTX   WDDM  | 00000000:01:00.0 Off |                  N/A |
| 41%   29C    P8    10W / 280W |   3678MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A         4      C   Insufficient Permissions        N/A      |
+-----------------------------------------------------------------------------+

This is using parameters (n_components=3, n_neighbors=15, min_dist=0.0) to create the UMAP model and fit_transform operation to apply it.

Using rapidsai/rapidsai:21.10-cuda11.2-base-ubuntu18.04-py3.8 with torch==1.9.1+cu111 applied on top of the environment.

Any idea why this works for the large dataset and not intermediate sized datasets please?

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Thank you @viclafargue! I’m glad we got to the bottom of this, please keep us updated.

@viclafargue the CUDA 11.6 version reported by SMI is just the maximum CUDA version supported by that driver, doesn’t mean that the environment/locally CUDA 11.6 is being used

@alexwilson1 could you run https://github.com/rapidsai/cuml/blob/branch-21.12/print_env.sh and put the output in a reply here? That’ll also help triage