papermill: `RuntimeError: Kernel didn't respond in 60 seconds`, when trying to run papermill with python multiprocessing

Hello, I am trying to run multiple parameterized notebooks in parallel. Currently, I am using papermill inside Jupyter Notebook and if I try to use multiprocessing pool to map a list of parameters as pass them to pm.execute_notebook, I get RuntimeError: Kernel didn't respond in 60 seconds. I am running everything with Python 2.7.

This is the code I use:

import papermill as pm
import multiprocessing as mp

def run_nb(data):
    d1, d2 = data
    pm.execute_notebook(in_nb, out_nb, parameters = dict(d1=d1, d2=d2) )
    
pool = mp.Pool(4)
pool.map(run_nb, zip(data1, data2))
pool.close()
pool.join()

It works correctly using the standard python map.

Btw, is there a known way to produce multiple notebooks in parallel with papermill?

Thanks!

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 1
Comments: 18 (9 by maintainers)

Most upvoted comments

Hi, I might just be overlooking something, but I think I’m still experiencing this issue even after upgrading nbconvert. It seems to be an upstream issue with nbconvert, because I get the same issues when calling the execute API directly. Let me know if I should migrate this question to that repository.

To replicate:

import multiprocessing as mp
import nbconvert
assert "5.6." in nbconvert.__version__
from nbconvert.preprocessors import ExecutePreprocessor
import nbformat
import os
import papermill as pm

def run_pm(fn):
    pm.execute_notebook(fn, fn, request_save_on_cell_execute = False)
    
def run(fn):
    with open(fn) as f:
        nb = nbformat.read(f, as_version = 4)        
    ep = ExecutePreprocessor(timeout = None, kernel_name = "python3")
    ep.startup_timeout = 300    
    ep.preprocess(nb, {"metadata": {"path": os.getcwd() + "/"}})    
    with open(fn, "w", encoding = "utf-8") as f:
        nbformat.write(nb, f)
        
fn = "test.ipynb"

test.ipynb has a single cell that prints the word “testing”. The following works fine:

run_pm(fn)
run(fn)

But the following two code snippets each break

pool = mp.Pool(1)
pool.map(run_pm, [fn])
pool.close()
pool.join()

pool = mp.Pool(1)
pool.map(run, [fn])
pool.close()
pool.join()

with error code RuntimeError: Kernel didn't respond in 60 seconds in the first case and RuntimeError: Kernel didn't respond in 300 seconds in the second.

I’m using Python 3.7. I’ve been able to replicate this with both nbconvert 5.6.0 and 5.6.1.

Thanks!

sabjoslo on Dec 18, 2019

Wondering: what is the status for this issue? I can confirm that this problem is still present in Python3 when using Papermill. Will multiprocessing become doable with Papermill?

dsosa17 on May 28, 2019

This base issue should now be resolved with the nbconvert 5.6.0 release!

MSeal on Aug 16, 2019

FWIW, this problem also affects papermill on jupyter in python 3.

atronchi on Apr 9, 2019

Hi @franzoni315

So it looks like there’s race conditions in the ipython kernel launching parallel processes. Using a threadpool instead get’s to run more often without hanging but any high parallelism doesn’t beat the race conditions. I ran a few times under different conditions and eventually got

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/mseal/.py2local/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/mseal/.py2local/local/lib/python2.7/site-packages/traitlets/config/application.py", line 657, in launch_instance
    app.initialize(argv)
  File "<decorator-gen-121>", line 2, in initialize
  File "/home/mseal/.py2local/local/lib/python2.7/site-packages/traitlets/config/application.py", line 87, in catch_config_error
    return method(app, *args, **kwargs)
  File "/home/mseal/.py2local/local/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 467, in initialize
    self.init_sockets()
  File "/home/mseal/.py2local/local/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 239, in init_sockets
    self.shell_port = self._bind_socket(self.shell_socket, self.shell_port)
  File "/home/mseal/.py2local/local/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 181, in _bind_socket
    s.bind("tcp://%s:%i" % (self.ip, port))
  File "zmq/backend/cython/socket.pyx", line 547, in zmq.backend.cython.socket.Socket.bind
  File "zmq/backend/cython/checkrc.pxd", line 25, in zmq.backend.cython.checkrc._check_rc
    raise ZMQError(errno)
ZMQError: Address already in use

And

Traceback (most recent call last):
  File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/home/mseal/.py2local/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3231, in atexit_operations
    self.history_manager.end_session()
  File "/home/mseal/.py2local/local/lib/python2.7/site-packages/IPython/core/history.py", line 580, in end_session
    self.writeout_cache()
  File "<decorator-gen-23>", line 2, in writeout_cache
  File "/home/mseal/.py2local/local/lib/python2.7/site-packages/IPython/core/history.py", line 60, in needs_sqlite
    return f(self, *a, **kw)
  File "/home/mseal/.py2local/local/lib/python2.7/site-packages/IPython/core/history.py", line 786, in writeout_cache
    self._writeout_input_cache(conn)
  File "/home/mseal/.py2local/local/lib/python2.7/site-packages/IPython/core/history.py", line 770, in _writeout_input_cache
    (self.session_number,)+line)
DatabaseError: database disk image is malformed

I also noticed the race conditions onexit occur every run and are causing session saves to fail (nbd but points to the reuse of session_number which overlaps).

I can also reproduce this failure with a simple bash for loop over papermill. I’ll open up a ticket on the ipython project to figure out what the root cause is and see if there’s a change in papermill that would fix this.

MSeal on Oct 28, 2018

It will become doable – we need a release for multiple upstream libraries and there’s still one pending PR testing an edge case we haven’t fixed for one of those releases. Give the community a couple weeks more here, there’s a lot of moving parts and it’s been unsupported for a long time in the upstream projects. You can pay attention to nbconvert 5.5.1 release announcement on Discourse and on the jupyter mailing list. That will be the last release to get it resolved.

MSeal on May 28, 2019