jupyter-book: Exit with non-zero status if notebooks encounter an unexpected exception.

Summary of the problem

Our project is switching to jupyter-book for documentation. Our documentation is built as part of our CI process using github actions. One challenge we’re encountering is ensuring that changes to our project code don’t break examples in our documentation.

Since the notebooks used for documentation are executed as part of our CI process now, it seems natural that any unexpected errors in our notebook should trigger a failure on our CI system. However, this is not the case.

When the notebooks are executed jupyter-book will report a successful build of documentation even if unexpected exceptions were raised. (in sphinx.py, the build process returns app.statuscode which doesn’t take into account whether any exceptions were raised in the notebook).

The solution we’d like

During the build process, myst-nb keeps track of which notebooks are successfully built, and which ones failed, as described here: https://myst-nb.readthedocs.io/en/latest/use/execute.html#execution-statistics

We propose allowing a new flag (perhaps -f or --fail-on-exception) that would cause the build process to return a non-zero value (perhaps the number of failed notebooks) in the event of an unexpected exception.

Notebooks meant to demonstrate exceptions could still allow them to be raised without issue by tagging their cells with raises-exception. The machinery for this is already in place and works.

This system would allow our documentation to “test itself” during the build process. At least on github actions, the log files which further describe the exceptions encountered can be saved as build artifacts to provide more information about what’s going wrong in the notebooks.

Alternatives we’ve considered.

We had a test system in place that tests notebooks and looks for errors. This works but is really inefficient. Our notebooks are evaluated twice during the build process and it basically doubles the job time on the CI system.

We’d considered other third-party notebook testing packages like testbook, but we didn’t have a great experience with them.

Having the jupyter-book build process serve as its own testing process under CI seems like a reasonable solution.

Additional context

I have a branch that demonstrates the functionality here: https://github.com/robfalck/jupyter-book/tree/fail_on_exception

As a demonstration, I created a new JupyterBook and made four copies of notebooks.ipynb. All of these notebooks included a new cell with

assert 1 == 2

Two of these notebooks tag that cell with raises-exception, and two do not.

Executing with the command

jupyter-book build -f .

Results in the following warnings during the build:

WARNING: Execution Failed with traceback saved in /Users/rfalck/Projects/test_error_book/_build/html/reports/notebooks2.log
WARNING: Execution Failed with traceback saved in /Users/rfalck/Projects/test_error_book/_build/html/reports/notebooks2.log

and the build ends with


The following notebooks encountered unexpected exceptions.
Add the tag 'raises-exception' to the offending cells to ignore these exceptions.

notebooks2
subfolder/notebooks2

===============================================================================

Building your book, returns a non-zero exit code (2). Look above for the cause.

===============================================================================

This demonstrates that the change allows exceptions when the raises-exception tag is present. The docs will also completely build when the -f switch is not present, so it doesn’t break current behavior.

I’m not sure what the correct process is here for submitting pull requests. I’d be happy to submit one with my branch, or make any changes that are wanted before doing so.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 2
  • Comments: 15 (6 by maintainers)

Most upvoted comments

@chrisjsewell : Hey, that worked! There may be some hope yet for this path. I’d really like to get down to zero warnings, because the warnings are useful for finding some real problems like invalid links.

@chrisjsewell : This looks like it will be a complete success. I am down to just 55 errors that I know how to fix, so hopefully by the end of today, I should have this working. We’ll have to use the bleeding-edge jupyter-notebook repo until the latest makes it to pypi, but I think this means we won’t need the feature requested here, as we will ideally be building with zero warnings.

Thank you so much for the help you’ve given me.

FYI the way the toc is parsed has completely changed in #1293, and you won’t get this warning any more 😬

Thanks for clarifying @robfalck I see what you’re saying – in that you’d like a mechanism purely to test code (rather than mixed with non-code failures). I am planning (in the medium term) to work on an extension sphinx-testing that will enable to inclusion of unittests in documents (a feature requested by one of our projects) but I will also pin to that request a mechanism to run code-cells and see if we can isolate warnings specifically for execution failures.

Good news @Kenneth-T-Moore

Indeed I have found resolving the warnings issued to be a great strategy for our projects, as they then become useful to diagnose issues with content layouts + execution. In my experience once they are resolved – they aren’t reintroduced quickly if that helps 😃.

Thank you for your feedback @mmcky

That way works but it also demands that you have zero other warnings anywhere else in your docs. For instance, in our docs we initially had a horizontal line at the start of a section, which would trigger:

WARNING: Document or section may not begin with a transition.

I can appreciate the desire to build documentation with zero warnings, but it seems like it would be better to allow users to ignore some warnings at their discretion, while still catching things which are clearly broken in the code.