qiskit: Localjob is not process safe and leads to hang
Users cannot run multiple processes, each of which submits some jobs.
As a user, I want to run multiple processes to submit the jobs in parallel. However, they will hang in mac.
You can argue that (#567) the user would better assembly multiple circuits and send them in one job, then the qiskit will internally use the PoolExecutor
to handle them.
However, as a general tool, qiskit should not impose restrictions on how the users run it. The users should have the options to decide whether or not to submit the jobs using multiple processes.
To fix the hang bug, simply change the code as follows.
class LocalJob(BaseJob):
if sys.platform in ['darwin', 'win32']:
_executor = futures.ThreadPoolExecutor()
else:
_executor = futures.ProcessPoolExecutor()
def __init__(self, fn, qobj):
super().__init__()
self._qobj = qobj
self._backend_name = qobj.header.backend_name
self._future = self._executor.submit(fn, qobj)
->
import os
class LocalJob(BaseJob):
processes2executors = {}
def __init__(self, fn, qobj):
super().__init__()
self._qobj = qobj
self._backend_name = qobj.header.backend_name
pid = os.getpid()
if pid not in LocalJob.processes2executors:
print(pid, "first time to create the executor")
if sys.platform in ['darwin', 'win32']:
_executor = futures.ThreadPoolExecutor()
else:
_executor = futures.ProcessPoolExecutor()
LocalJob.processes2executors[pid] = _executor
else:
print(pid, "reuse my executor")
_executor = LocalJob.processes2executors[pid]
self._future = _executor.submit(fn, qobj)
In the original code, the executor is shared by multiple job-submitting processes. Some process cannot be joined even if it has finished its work because the underlying executor is still busy. In the patch, the executor is made process-local, i.e., each process owns an executor. So, the process can be joined after it finishes its own work.
This is only a patch for mac. For Linux, the executor is the ProcessPoolExecutor
, which needs to be explicitly shut down using executor.shutdown()
before the job-submitting process is to be joined. Otherwise, it will still lead to another hang.
To achieve this, qiskit needs to provide the ‘session’ APIs so that each job-submitting process opens a session upon start and closes the session upon join, where the session close can trigger the executor.shutdown()
.
Anyway, the original code of Localjob looks like a code smell.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 23 (23 by maintainers)
I like the idea of improving the flexibility of parallel jobs although we maybe should investigate further whether something exists already that allows circumventing the GIL.
For example one short term solution of getting around the picklable issue @liupibm is to increase the number of objects which can be serialized, for instance like with the python dill package.
@liupibm When you submit the pull request it should already get tested on all the platforms. It would be good if you could add a small parallel test code which requires this change to work.