nodegit: "Invalid collation character" when trying to diff

I’m not 100% sure whether this is a nodegit or a libgit2 issue, so apologies in advanced if this isn’t nodegit’s fault.

When trying to get diffs for certain commits, I’m getting the error “Invalid collation character” returned from nodegit. However, this isn’t for every commit, and it’s not for every platform either. I’ve only been able to get this to happen in this situation:

  • Using nodegit (tested a whole mess of versions, mainly v0.13.2 and v0.15.1)
  • Running under electron (only tested v1.2.3)
  • On Linux (64-bit, haven’t tested 32-bit; specifically on the latest releases of Debian and Ubuntu)

If I’m on any other platform (e.g. just pure nodejs not electron, or Windows or OSX) the problem doesn’t happen.

The easiest way to reproduce the problem is to grab a clone of the sources for git itself: https://github.com/git/git

And then run some variation of this script under electron:

var git = require("nodegit");

git.Repository.open("/home/tyler/Desktop/git")
    .then(
        (repo) => {
            console.log("Repo is open.");

            return repo.getCommit("f5236a776f31de2654bca9001aa74ef9fe0819d8")
                .then(
                    (commit) => {
                        return commit.getTree();
                    }
                )
                .then(
                    (tree) => {
                        return git.Diff.treeToWorkdir(repo, tree, new git.DiffOptions());
                    }
                )
            ;
        }
    )
    .then(
        (diff) => {
            console.log("Got the diff");

            return diff.patches()
                .then(
                    (patches) => {
                        console.log("Got the patches.");
                        console.log("Files changed:");

                        patches.forEach(
                            (patch) => {
                                let newPath = patch.newFile().path();
                                let oldPath = patch.oldFile().path();
                                console.log(newPath, oldPath);
                            }
                        );
                    }
                )
            ;
        }
    )
    .then(
        () => {
            console.log("Done!");
        }
    )
    .catch(
        (error) => {
            console.error("Some error occurred:");
            console.error(error);
        }
    )
;

Which, instead of logging the files changed in that commit as I would expect, it outputs this instead:

Repo is open.
Got the diff
Some error occurred:
Error: Invalid collation character
    at Error (native)

I’m not terribly familiar with debugging C programs on linux, but I was able to piece together this stack trace:

GitPatch::ConvenientFromDiffWorker::Execute
git_patch_from_diff
diff_patch_alloc_from_diff
diff_patch_init_from_diff
git_diff_file_content__init_from_diff
diff_file_content_init_common
git_diff_driver_lookup
git_diff_driver_load
git_diff_driver_builtin
regcomp # The function that actually returns the error.

For some reason git_diff_driver_builtin is using a regex to determine the diff driver to use, and that regex is failing for whatever reason.

Even weirder, it doesn’t fail for every commit, but a lot of them. Here are some notes I took while trying to figure this out:

master (as of this writing): f8f7adce9fc50a11a764d57815602dcb818d1816
Last good commit on master (found by using git reset --hard HEAD~1): b48dfd86c90cae3f98dca01101b7e298c0192d16
First bad commit on master: ad2d77760434e1650c186c71fa04a8fdbd77266c
Last bad commit on master: f5236a776f31de2654bca9001aa74ef9fe0819d8
First good commit before the above bad commit: 566fdaf611f44724120412a43132c07b020fc4f1
Useful command for going back that far: git reset --hard HEAD~35

Reasons why I’m filing a nodegit issue instead of a libgit2 issue:

  • I’m only seeing this under electron, so maybe nodegit is doing something funny when compiled for electron?
  • Perhaps we could patch nodegit so that we can specify the diff driver to use for diff.patches()? Otherwise I’m not sure how to work around this.

Let me know if there’s any other information I can provide, or generally if anyone has any ideas. I can follow the libgit2 code, but I haven’t been able to figure out understand why a regex would be (seemly randomly) failing.

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Reactions: 6
  • Comments: 20 (2 by maintainers)

Commits related to this issue

Most upvoted comments

Same issue here with GitKraken 3.3.0 (installed via the Debian package) on Ubuntu 17.10 (french flavor).

The workaround provided by @arthurp fixed the issue: replaced Exec=/usr/share/gitkraken/gitkraken %U by Exec=env LC_ALL=C /usr/share/gitkraken/gitkraken %U in the /usr/share/applications/gitkraken.desktop file.

As a work around for English speakers, you can set “LC_ALL=C” when you start the application. You can set your launcher to launch /bin/bash -c "LC_ALL=C /usr/share/gitkraken/gitkraken %U" (using gitkraken as an example) instead of the default (this is easy to do using the Alacarte, Gnome’s main menu editor). You could also create a launcher script that does something similar.

Still there in 6.0.0

Ubuntu 18.04, GitKraken 4.0.2 (installed Gitkraken through Ubuntu’s software manager)

using https://github.com/nodegit/nodegit/issues/1097#issuecomment-348749117, I added LC_ALL=C to the Exec=env line, in the file /var/lib/snapd/desktop/applications/gitkraken_gitkraken.desktop

Exec=env LC_ALL=C BAMF_DESKTOP_FILE_HINT=/var/lib/snapd/desktop/applications/gitkraken_gitkraken.desktop /snap/bin/gitkraken %U

It looks like the regex library in Libgit2 has been updated to accomodate collation issues. https://github.com/libgit2/libgit2/pull/4935

I’m hoping to get https://github.com/nodegit/nodegit/pull/1690 into the next 0.25.0 alpha version. And eventually 0.25.0.

same issue, will someone fix it?

I’m facing the same issue as well in GitKraken. Running it with LC_ALL=C works, but not otherwise.

My locale is mostly US English. When I installed Xubuntu I set my timezone to Kiev, but picked English everywhere that was asked and have English (United States) set as the regional format. The only thing non-english is Russian as an additional (secondary) keyboard layout. The problem occurs only in one repository for most of the files in the repository (except brand-new ones). The output of locale is:

LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

env | grep LC shows:

LC_PAPER=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
LC_NUMERIC=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_NAME=en_US.UTF-8 

The workaround in #1097 (comment) worked for me on Ubuntu 18.04. I can also verify that if you do not have root access to modify the launcher in /usr/share/applications, you can just copy the file to you home folder and edit it there. GNOME will prefer the launcher in .local/share/applications over the one in /usr/share/applications.

cp gitkraken.desktop/usr/share/applications/gitkraken.desktop ~/.local/share/applications
nano ~/.local/share/applications/gitkraken.desktop

I have found what triggers this error on my machine. It is the following line in my ~/.gitattributes file.

*.rb diff=ruby

Removing this line resolves the error that GitKraken gave.

Updating diffs failed Could not update diffs for the following reason: Error: Invalid collation character

My ~/.gitconfig contains:

[core]
	attributesfile = ~/.gitattributes

I still don’t understand why this triggers the error but at least I now have a workaround.

I can third this issue in GitKraken. It seems that the root cause is when a commit author has a name with UTF-8 characters in it. Only one repo I have exhibits the issue, and sure enough it is the only one with an author who has UTF-8 character name.

I have the same output of locale with no LC envs defined.

Here’s the best I could trace the regcomp function usage: https://github.com/libgit2/libgit2/blob/20302aa43738a972e0bd2e2ee6ae479208427b31/src/diff_driver.c#L213

It would seem that regcomp cannot handle UTF-8?