sos: Nested workflow too many open files (solved with limitation)
Separated from #1374
here is the complete log:
and the last few lines are the errors:
DEBUG: EXECUTOR - Stop controller from 10339
DEBUG: EXECUTOR - disconntecting master
DEBUG: CONTROLLER - 10339 : closes socket with handler 17 (344 left)
DEBUG: CONTROLLER - 10339 : closes socket with handler 20 (343 left)
DEBUG: CONTROLLER - 10339 : closes socket with handler 23 (342 left)
DEBUG: CONTROLLER - controller stopped 10339
DEBUG: CONTROLLER - 10339 : closes socket with handler 39 (341 left)
DEBUG: CONTROLLER - 10339 : closes socket with handler 32 (340 left)
DEBUG: CONTROLLER - Disconnecting sockets from 10339
File "/scratch/midway2/gaow/miniconda3/lib/python3.7/site-packages/sos/__main__.py", line 642, in cmd_run
executor.run(args.__targets__, mode=config['run_mode'])
Traceback (most recent call last):
File "/scratch/midway2/gaow/miniconda3/lib/python3.7/site-packages/sos/__main__.py", line 642, in cmd_run
executor.run(args.__targets__, mode=config['run_mode'])
File "/scratch/midway2/gaow/miniconda3/lib/python3.7/site-packages/sos/workflow_executor.py", line 347, in run
return self.run_as_master(targets=targets, mode=mode)
File "/scratch/midway2/gaow/miniconda3/lib/python3.7/site-packages/sos/workflow_executor.py", line 1649, in run_as_master
raise exec_error
sos.executor_utils.ExecuteError: [hybrid]: Too many open files
[hybrid]: Exits with 1 pending step (hybrid_4)
ERROR: [hybrid]: Too many open files
[hybrid]: Exits with 1 pending step (hybrid_4)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 26 (26 by maintainers)
Commits related to this issue
- start using a new socket allocation method #1376 — committed to vatlab/sos by deleted user 4 years ago
- Throttle pending workflows #1376 — committed to vatlab/sos by deleted user 4 years ago
- Throttle number of subworkflows sent to the controller at the same time #1376 — committed to vatlab/sos by deleted user 4 years ago
Yes, let me have a detailed look tomorrow. It is possible that there is not enough nested workflow to feed to the task.
Just because I went a long way trying to address the socket problem by changing how the sockets were generated and used, but finally found out the reason for the original implementation. 😦
So it makes sense to write down reasons behind the code…
The bulk part has been done but workflows stop here and there. There should be only one or two hiccups but I am asked to expand my outbreak simulator to simulate the best strategy for community-based PCR testing.