ColBERT: Model stuck in Loading decompress_residuals_cpp extension

@okhat Hi, I am running colbert with following configuration on a single GPU.

Following is the script I am using. I just wanted to see if I could run a quick indexing end to end.

import os
import sys
sys.path.insert(0, '../')

from colbert.infra import Run, RunConfig, ColBERTConfig
from colbert.data import Queries, Collection
from colbert import Indexer, Searcher


if __name__ == '__main__':
    dataroot = 'downloads/lotte'
    dataset = 'lifestyle'
    datasplit = 'dev'

    queries = os.path.join(dataroot, dataset, datasplit, 'questions.search.tsv')
    collection = os.path.join(dataroot, dataset, datasplit, 'collection.tsv')

    queries = Queries(path=queries)
    collection = Collection(path=collection)

    f'Loaded {len(queries)} queries and {len(collection):,} passages'

    print(queries[24])
    print()
    print(collection[89852])
    print()

    nbits = 2   # encode each dimension with 2 bits
    doc_maxlen = 300   # truncate passages at 300 tokens

    checkpoint = 'downloads/colbertv2.0'
    index_name = f'{dataset}.{datasplit}.{nbits}bits'

    with Run().context(RunConfig(nranks=1, experiment='msmarco')):  # nranks specifies the number of GPUs to use.
        config = ColBERTConfig(doc_maxlen=doc_maxlen, nbits=nbits)

        indexer = Indexer(checkpoint=checkpoint, config=config)
        indexer.index(name=index_name, collection=collection[:20], overwrite=True)

    print(indexer.get_index()) # You can get the absolute path of the index, if needed.

However the indexing seems to be stuck at this point.

WARNING clustering 2687 points to 512 centroids: please provide at least 19968 training points
Clustering 2687 points in 128D to 512 clusters, redo 1 times, 20 iterations
  Preprocessing in 0.00 s
  Iteration 19 (1.10 s, search 0.23 s): objective=463.491 imbalance=1.616 nsplit=0
[Apr 22, 02:32:04] Loading decompress_residuals_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...

I am running on a Quadro RTX 8000 (49GB) and 128GB RAM machine.

About this issue

Most upvoted comments

I run into the same issue yesterday (regardless of number of passages) but removing py38_cu113 from .cache solved it for me.

I tested with small (10K) and large number of passages (.6M) and both worked fine. The issue appeared to originate by interrupting the process during an initial test run. This resulted in some IO hanging in the attempt to load the output files, which resulted in no errors even when debugging was turned on. Setup is RTX2080 on Ubuntu 20.04 LTS so quite comparable to past reported issues. I didn’t have to re-do the conda environment but that might also help in some cases, or even trying the cpu only.