spikeinterface: Cannot get KS4 to utilize my GPU

Hi, I am trying to get KS4 to work on spikeInterface to sort long NPX1.0 recording sessions. Whether I use the docker image or a local KS4 installation, my GPU is not engaged, and processing takes forever to complete. It shouldn’t be my CUDA settings, since I can run standalone KS4 just fine using the GPU (RTX 3060). Do you have any suggestions on what might be wrong with my spikeInterface setup? I am running the following script in Jupyterlab on a Windows machine.

Thanks! Ali


# Define recording object using the local file address
raw_rec = si.read_openephys(base_folder,stream_id='0')
rec1 = si.highpass_filter(raw_rec, freq_min=400.)
bad_channel_ids, channel_labels = si.detect_bad_channels(rec1)
rec2 = rec1.remove_channels(bad_channel_ids)
print('bad_channel_ids', bad_channel_ids)
rec3 = si.phase_shift(rec2)
rec4 = si.common_reference(rec3, operator="median", reference="global")
processed_recording = si.concatenate_recordings([rec4])
processed_recording

# Perform Sorting (this will take forever)
sorting = si.run_sorter(sorting_algorithm, processed_recording, output_folder=base_folder / sorting_algorithm / 'sorting_output',
                        docker_image=False, verbose=True, remove_existing_folder= True, delete_container_files=False, **sortingParams)

About this issue

  • Original URL
  • State: closed
  • Created 4 months ago
  • Comments: 17 (3 by maintainers)

Most upvoted comments

I don’t think it’s just the drift correction. I am now extracting templates, it’s been over two hours and only 30% completed

yep that’s what I was saying too.

@alejoe91 I’m wondering if this does have to do with the lack of writing a binary file is slowing things down. I don’t know why it would be fine on linux but not on Windows though. But kilosort doesn’t have the same multiprocesssing for some of the io stuff, so maybe we are having an OS specific issue with file io.

For reference I chose a small dataset for testing KS4 (64 channel, 15 gb total) and my GPU, SSD, and CPU are all older than @mohebial so I’m getting even longer times for my steps. Like I said if I write a binary file from spikeinterface and then do run_kilosort it fails very quickly as it doesn’t detect enough spikes.

It crashed my computer overnight… I’ll try again today and report back when I can. What I can say though is that with SI it was working. I’ll try native KS4 one more time if that fails I’ll play with the wrapper locally and add some print statements to see if the wrapper is accidentally doing something different for windows. As we can see the device is correct in the environment… image

LIVE EDIT: Got it running, but it failed for me saying it could not find any data worth sorting (this dataset previously worked with KS3 and KS2 and MS5). I’ll start at least poking around with the wrapper. But we can see that native KS4 finds the GPU:

image

One other strange note was that if I specify the dtype as np.int16 it fails completely but specifying as ‘int16’ at least gets to the following error: image

So I’ll poke around with our SI wrapper locally.

Hey both. I was actually going to open an issue later today. CUDA is available on my device. When I was running the SpikeInterface wrapper it said it was going to take between 48-120 hours to run KS4 based on the constantly changing tqdm bar. So I thought maybe it was two things:

  • the wrapper is slowing things down somehow
  • the lack of binary file is causing slow reading

so I also tried to run KS4 after writing a binary file separately and loading into KS4 the binary file and the probe map. When I left at the end of the day yesterday it was running in my shell (no tqdm appeared for native KS4) so I was going to make a report after I got to lab to see if it had worked overnight or not. Based on the task manager it seems like it was still just using cpu instead. For me I tried to push it and use python 3.11 so I might downgrade to 3.10/3.9 and see if that’s part of my issue. But on their read me it says that CUDA issues might require more personal CUDA troubleshooting (e.g. my GPU is old and only has 4gb of dedicated memory and they rec 6 or 8 gb so maybe that is part of my problem).

To summarize: KS4 has yet to work for me with SI or without SI. But I’ll do an update to this when I get to lab.