dask-cuda: LocalCudaCluster freezes when trying neural network prediction

Hi, I am new to dask and I was trying to run write a workflow to run inference on large images. I have attached the code Ive been using which should reproduce the issue I am facing.

Basically, if I use the distributed client scheduler with (Processes=False) and also when not using a scheduler, I am able to run inference of my data. However, when I try to use LocalCudaCluster as the scheduler, I run into issues.

  • In general, the process crashes and doesnt complete
  • I have tried using with it 1 GPU/2 GPUs, using single threads and multiple threads per GPU.
  • It does seem to work for a subset of the data (and not will my full data) (controlling dim0 in the size param in line 83), though much slower.

Quite possible, Im doing something incorrectly. The codes should help reproduce this.

Thanks for your help figuring this out.

Anas Test_prediction.zip

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (9 by maintainers)

Most upvoted comments

Oh yes, to reduce the overall memory for testing you could reduce the bsz parameter to 8 This brings down memory consumption to ~18 GB or so.

I will test with the latest and circle back