faiss: Faiss assertion 'err == CUBLAS_STATUS_SUCCESS'

Platform

OS: Debian 9

Faiss version: 1.6.3

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python

Reproduction instructions

I’m trying to train faiss IVF index on Nvidia A100 instances. Currently A100 instances support cuda 11.0 I installed faiss with cuda toolkit 10.0

Code

>>> import numpy as np
>>> import faiss
>>> 
>>> d = 256
>>> quantizer = faiss.IndexBinaryFlat(d)
>>> index = faiss.IndexBinaryIVF(quantizer, d, 4096)
>>> xt = faiss.randint((100000, 256 // 8)).astype('uint8')
>>> 
>>> index.train(xt)
WARNING clustering 100000 points to 4096 centroids: please provide at least 159744 training points
>>> quantizer2 = faiss.IndexBinaryFlat(d)
>>> index2 = faiss.IndexBinaryIVF(quantizer2, d, 4096)
>>> index2 = faiss.IndexBinaryIVF(quantizer2, d, 4096)
>>> clustering_index = faiss.index_cpu_to_all_gpus(faiss.IndexFlatL2(d))

>>> 
>>> index2.clustering_index = clustering_index
>>> index2.train(xt)

Exception log

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at ./faiss/gpu/utils/MatrixMult-inl.cuh:133; details: cublas failed (13): (512, 256) x (4096, 256)' = (512, 4096)

I guess this is due to cuda version mismatch. Is there a plan to support cuda 11.0?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (4 by maintainers)

Most upvoted comments

It helped me to install a specific wheel with faiss-gpu==1.7.3:

pip install https://github.com/kyamagu/faiss-wheels/releases/download/v1.7.3/faiss_gpu-1.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Again, having the issue too.

A100, faiss-gpu==1.7.1, cuda==11.1, Ubuntu 20.04

CUBLAS_STATUS_SUCCESS is a relatively generic error. If you want help, open a new issue and give more context.

@naveenkumarmarri Were you able to solve your issue? I’m currently having the same problem with an a100 and CUDA 11