sos: SoS submitter hang on cluster
I’ve experienced various hanging behavior for the job submitter. “Hang” here means that job queue is empty but SoS refuses to move on. It looks like stuck on the current job submission, yet ps -A | grep sos shows nothing. With ctrl-c I can keyboard interrupt it.
There are now 2 types of hangs I can reliably reproduce. Hopefully by describing them you’ll be able to make some MWE for your cluster:
- When my job exceeds the
walltime - When my specified directory for
errandoutfiles do not exist, eg:
#SBATCH --output={cur_dir}/non_existing_dir/{job_name}.out
#SBATCH --error={cur_dir}/non_existing_dir/{job_name}.err
I hope this is enough to reproduce it.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 19 (19 by maintainers)
Yes, that is my suggestion, perhaps
$HOMEto allow shell expansion ofHOMEis better because SoS would expand{home_dir}to host-specific full directory with user name, which is arguable better be replaced with a generic$HOME.