bcbio-nextgen: iPython cluster submission problem

Hi there,

I’m currently working on getting our bcbio installation running on our cluster to speed up the analysis. So far, I’m able to submit the jobs by using the IPython framework to our SGE, however, it then suddenly crashes, and I’m not sure what the reason is:

I submit the job with qsub to the q and the nodes (96 in total) are also correctly allocated:

#$ -q fancy.q     
#$ -l nodes=4,rpn=24      
#$ -N IPython_test    
#$ -j y             
#$ -cwd             

bcbio_nextgen.py ../config/RNAseq_config.yaml -t ipython -n 24 -s sge

and then get the following error message from bcbio:

[2017-05-03T15:23Z] compute-6-11: System YAML configuration: /bcbio/galaxy/bcbio_system.yaml [2017-05-03T15:23Z] compute-6-11: Resource requests: ; memory: 1.00; cores: 1 [2017-05-03T15:23Z] compute-6-11: Configuring 1 jobs to run, using 1 cores each with 1.00g of memory reserved for each job Traceback (most recent call last): File “/tooldir/bin/bcbio_nextgen.py”, line 4, in <module> import(‘pkg_resources’).run_script(‘bcbio-nextgen==1.0.2’, ‘bcbio_nextgen.py’) File “/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py”, line 726, in run_script

File “/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py”, line 1484, in run_script

File “/bcbio/anaconda/lib/python2.7/site-packages/bcbio_nextgen-1.0.2-py2.7.egg-info/scripts/bcbio_nextgen.py”, line 234, in <module> main(**kwargs) File “/bcbio/anaconda/lib/python2.7/site-packages/bcbio_nextgen-1.0.2-py2.7.egg-info/scripts/bcbio_nextgen.py”, line 43, in main run_main(**kwargs) File “/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py”, line 50, in run_main fc_dir, run_info_yaml) File “/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py”, line 82, in /bioinformatics/_run_toplevel system.write_info(dirs, parallel, config) File “/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/system.py”, line 32, in write_info minfos = _get_machine_info(parallel, sys_config, dirs, config) File “/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/system.py”, line 58, in _get_machine_info with prun.start(parallel, [[sys_config]], config, dirs) as run_parallel: File “/bcbio/anaconda/lib/python2.7/contextlib.py”, line 17, in enter return self.gen.next() File “/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/prun.py”, line 55, in start with ipython.create(parallel, dirs, config) as view: File “/bcbio/anaconda/lib/python2.7/contextlib.py”, line 17, in enter return self.gen.next() File “/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py”, line 1069, in cluster_view wait_for_all_engines=wait_for_all_engines) File “/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py”, line 956, in init _start(scheduler, self.profile, queue, num_jobs, cores_per_job, self.cluster_id, extra_params) File “/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py”, line 829, in _start resources, specials = _scheduler_resources(scheduler, extra_params, queue) File “/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py”, line 809, in _scheduler_resources specials[“pename”] = _find_parallel_environment(queue) File “/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py”, line 339, in _find_parallel_environment for name in subprocess.check_output([“qconf”, “-spl”]).strip().split(): File “/bcbio/anaconda/lib/python2.7/subprocess.py”, line 212, in check_output process = Popen(stdout=PIPE, *popenargs, **kwargs) File “/bcbio/anaconda/lib/python2.7/subprocess.py”, line 390, in init errread, errwrite) File “/bcbio/anaconda/lib/python2.7/subprocess.py”, line 1024, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

Any help is highly appreciated!

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 18 (13 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for the details on your setup, it looks like not all of the queues have memory specifications which is why qhost -q fails to identify them. So I think bcbio is doing the right thing here by falling back to running a job to get memory/cores on machines and there is nothing we can improve there. Hopefully getting a PE setup will allow things to work distributed for you. Thanks for all the help debugging.