dask-jobqueue: PBSCluster does not start

Hi,

I’ve been trying to use dask with dask-jobqueue on an HPC cluster that uses PBS scheduling, but I can’t make it to start the processes. I’m following the steps shown in the “Dask on HPC Introduction”.

Essentially, my steps are:

from dask_jobqueue import PBSCluster
from dask.distributed import Client

cluster = PBSCluster(memory="1GB",
                                  resource_spec='select=1:ncpus=36:mem=1000mb',
                                  cores=36)
# (with appropriate queue and project options from yml file)
# also, changing memory keywords doesn't seem to help

Then, if I print cluster, I get

PBSCluster(cores=0, memory=0 B, workers=0/0, jobs=0/0)

which probably means that submission fails, right?

However, when I manually do qsub job_script.sh where job_script.sh contains the output of print(cluster.job_script()), the job is submitted without errors and I can see it in the queue.

So I’m a bit stuck here. Any help would be greatly appreciated please.

Many thanks, Denis

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 29 (15 by maintainers)

Most upvoted comments

Okay, so we’ll leave this as closed. Feel free to open another discussion if you identify something that could be done to have dask-jobqueue on Cray. In the mean time, you can try to launch dask by hand or with other solutions:

According to https://user.cscs.ch/computing/data_science/dask/

Dask and Distributed Dask are provided as part of the Cray Urika-XC analytics package.

So it looks like this is indeed possible but needs some configuring.

I don’t recall if already mentionned, but you can use a job (interactive or not) to run you main script and start your Cluster/Scheduler. And with #186, this should be possible remotly.

I’m fine with keeping it open, as long as you answer the call 👍