dask-cuda: LocalCudaCluster freezes when trying neural network prediction
Hi, I am new to dask and I was trying to run write a workflow to run inference on large images. I have attached the code Ive been using which should reproduce the issue I am facing.
Basically, if I use the distributed client scheduler with (Processes=False) and also when not using a scheduler, I am able to run inference of my data.
However, when I try to use LocalCudaCluster as the scheduler, I run into issues.
- In general, the process crashes and doesnt complete
- I have tried using with it 1 GPU/2 GPUs, using single threads and multiple threads per GPU.
- It does seem to work for a subset of the data (and not will my full data) (controlling dim0 in the
sizeparam in line 83), though much slower.
Quite possible, Im doing something incorrectly. The codes should help reproduce this.
Thanks for your help figuring this out.
Anas Test_prediction.zip
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (9 by maintainers)
Oh yes, to reduce the overall memory for testing you could reduce the
bszparameter to 8 This brings down memory consumption to ~18 GB or so.I will test with the latest and circle back