delta: πŸ› does not work with `nbdiff` (from https://github.com/jupyter/nbdime)

Hey again!

So I’m not really sure if it’s a bug in nbdiff (https://github.com/jupyter/nbdime) or in delta, but they don’t seem to work together. Specifically, the diffs generated by nbdiff are missing from the output.

Here’s my reproduction script:

build_oci_img () {
  docker build "$@" - <<'EOF'
  FROM archlinux
  RUN pacman -Sy > /dev/null
  RUN pacman -S --noconfirm --needed coreutils curl gcc bc git python-pip > /dev/null
  ENV CARGO_HOME=/usr/local
  RUN curl -fsSL 'https://sh.rustup.rs' | sh -s -- -y --profile minimal
  RUN /usr/local/bin/cargo install git-delta
  RUN pip install jupytext ipython_genutils nbdime
  WORKDIR /repo
EOF
}

build_oci_img
docker run --rm -i "$(build_oci_img -q)" bash <<'EOF'

main() {
  git config --global user.email test@mail.com && git config --global user.name test
  git -c init.defaultBranch=main init
  echo 'print(0)' > file.py
  jupytext --to ipynb file.py
  cat file.py
  cat file.ipynb
  git add file.ipynb
  git commit -m 'initial commit'
  echo 'print(1)' >| file.py
  jupytext --to ipynb file.py
  echo 'Running git diff:'
  git diff
  echo
  echo 'Running nbdiff:'
  nbdiff
  echo
  echo 'Piping nbdiff to delta:'
  nbdiff | delta
}

main
EOF

The output I’m getting:

Initialized empty Git repository in /repo/.git/
[jupytext] Reading file.py in format py
[jupytext] Writing file.ipynb
print(0)
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9428c875",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(0)"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "main_language": "python",
   "notebook_metadata_filter": "-all"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
[main (root-commit) 643ff7a] initial commit
 1 file changed, 23 insertions(+)
 create mode 100644 file.ipynb
[jupytext] Reading file.py in format py
[jupytext] Writing file.ipynb (destination file replaced [use --update to preserve cell outputs and ids])
Running git diff:
diff --git a/file.ipynb b/file.ipynb
index 3d9a299..5c5bbd4 100644
--- a/file.ipynb
+++ b/file.ipynb
@@ -3,11 +3,11 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9428c875",
+   "id": "06311087",
    "metadata": {},
    "outputs": [],
    "source": [
-    "print(0)"
+    "print(1)"
    ]
   }
  ],

Running nbdiff:
nbdiff file.ipynb (HEAD) file.ipynb
--- file.ipynb (HEAD)  (no timestamp)
+++ file.ipynb  2022-12-09 17:25:45.285783
## modified /cells/0/id:
-  9428c875
+  06311087

## modified /cells/0/source:
-  print(0)
+  print(1)


Piping nbdiff to delta:
nbdiff file.ipynb (HEAD) file.ipynb

file.ipynb (HEAD)  (no timestamp) ⟢   file.ipynb  2022-12-09 17:25:45.285783
────────────────────────────────────────────────────────────────────────────────

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 19 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Well, putting lines after the +++ that don’t start with @, -, +, or space are clearly a violation of a unified diff. What git has done is to make use of the traditional β€œdiff …” line as the beginning of the file’s diff info and put extra lines in there, which I find superior. However, anyone is free to begin their own diff format or to try to get folks to agree to change the unified diff β€œstandard” (which is not written down, but just codified in programs such as gnu diff & patch).

Looking at the example, it appears that the notebook is a container of multiple files and they decided to use ## to indicate sub elements within the notebook. If so, I’d personally change the output to use normal --- & +++ lines for each element in the notebook (not ## lines). Perhaps something like:

nbdiff c.ipynb b.ipynb
diff c.ipynb/cells/9/outputs/0/data/text/plain b.ipynb/cells/9/outputs/0/data/text/plain
modified
--- c.ipynb/cells/9/outputs/0/data/text/plain
+++ b.ipynb/cells/9/outputs/0/data/text/plain
@@ -1 +1 @@
- <matplotlib.figure.Figure at 0x10ea05940>
+ <matplotlib.figure.Figure at 0x10eb21860>

diff etc
--- c.ipynb/cells/etc
+++ b.ipynb/cells/etc
@@ etc @@

However, not being familiar with what they’re trying to convey, that may not be a good match for their use case. If that is true, then I’d think that this should be considered to be a different diff format that delta could consider supporting, which would require it to have special code for understanding ## lines and their comments. It should not be surprising that the unidiff parser doesn’t handle it.

should I open a separate issue with a feature request for passing through unknown diff formats?

My vote is to solve it all in this ticket.