sos: Nested workflow too many open files (solved with limitation)

Separated from #1374

here is the complete log:

hybrid.log.gz

and the last few lines are the errors:

DEBUG: EXECUTOR - Stop controller from 10339
DEBUG: EXECUTOR - disconntecting master
DEBUG: CONTROLLER - 10339 : closes socket with handler 17 (344 left)
DEBUG: CONTROLLER - 10339 : closes socket with handler 20 (343 left)
DEBUG: CONTROLLER - 10339 : closes socket with handler 23 (342 left)
DEBUG: CONTROLLER - controller stopped 10339
DEBUG: CONTROLLER - 10339 : closes socket with handler 39 (341 left)
DEBUG: CONTROLLER - 10339 : closes socket with handler 32 (340 left)
DEBUG: CONTROLLER - Disconnecting sockets from 10339
  File "/scratch/midway2/gaow/miniconda3/lib/python3.7/site-packages/sos/__main__.py", line 642, in cmd_run
    executor.run(args.__targets__, mode=config['run_mode'])
Traceback (most recent call last):
  File "/scratch/midway2/gaow/miniconda3/lib/python3.7/site-packages/sos/__main__.py", line 642, in cmd_run
    executor.run(args.__targets__, mode=config['run_mode'])
  File "/scratch/midway2/gaow/miniconda3/lib/python3.7/site-packages/sos/workflow_executor.py", line 347, in run
    return self.run_as_master(targets=targets, mode=mode)
  File "/scratch/midway2/gaow/miniconda3/lib/python3.7/site-packages/sos/workflow_executor.py", line 1649, in run_as_master
    raise exec_error
sos.executor_utils.ExecuteError: [hybrid]: Too many open files
[hybrid]: Exits with 1 pending step (hybrid_4)
ERROR: [hybrid]: Too many open files
[hybrid]: Exits with 1 pending step (hybrid_4)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 26 (26 by maintainers)

Commits related to this issue

Most upvoted comments

Yes, let me have a detailed look tomorrow. It is possible that there is not enough nested workflow to feed to the task.

Just because I went a long way trying to address the socket problem by changing how the sockets were generated and used, but finally found out the reason for the original implementation. 😦

So it makes sense to write down reasons behind the code…

The bulk part has been done but workflows stop here and there. There should be only one or two hiccups but I am asked to expand my outbreak simulator to simulate the best strategy for community-based PCR testing.