nodegit: "Invalid collation character" when trying to diff
I’m not 100% sure whether this is a nodegit or a libgit2 issue, so apologies in advanced if this isn’t nodegit’s fault.
When trying to get diffs for certain commits, I’m getting the error “Invalid collation character” returned from nodegit. However, this isn’t for every commit, and it’s not for every platform either. I’ve only been able to get this to happen in this situation:
- Using nodegit (tested a whole mess of versions, mainly v0.13.2 and v0.15.1)
- Running under electron (only tested v1.2.3)
- On Linux (64-bit, haven’t tested 32-bit; specifically on the latest releases of Debian and Ubuntu)
If I’m on any other platform (e.g. just pure nodejs not electron, or Windows or OSX) the problem doesn’t happen.
The easiest way to reproduce the problem is to grab a clone of the sources for git itself: https://github.com/git/git
And then run some variation of this script under electron:
var git = require("nodegit");
git.Repository.open("/home/tyler/Desktop/git")
.then(
(repo) => {
console.log("Repo is open.");
return repo.getCommit("f5236a776f31de2654bca9001aa74ef9fe0819d8")
.then(
(commit) => {
return commit.getTree();
}
)
.then(
(tree) => {
return git.Diff.treeToWorkdir(repo, tree, new git.DiffOptions());
}
)
;
}
)
.then(
(diff) => {
console.log("Got the diff");
return diff.patches()
.then(
(patches) => {
console.log("Got the patches.");
console.log("Files changed:");
patches.forEach(
(patch) => {
let newPath = patch.newFile().path();
let oldPath = patch.oldFile().path();
console.log(newPath, oldPath);
}
);
}
)
;
}
)
.then(
() => {
console.log("Done!");
}
)
.catch(
(error) => {
console.error("Some error occurred:");
console.error(error);
}
)
;
Which, instead of logging the files changed in that commit as I would expect, it outputs this instead:
Repo is open.
Got the diff
Some error occurred:
Error: Invalid collation character
at Error (native)
I’m not terribly familiar with debugging C programs on linux, but I was able to piece together this stack trace:
GitPatch::ConvenientFromDiffWorker::Execute
git_patch_from_diff
diff_patch_alloc_from_diff
diff_patch_init_from_diff
git_diff_file_content__init_from_diff
diff_file_content_init_common
git_diff_driver_lookup
git_diff_driver_load
git_diff_driver_builtin
regcomp # The function that actually returns the error.
For some reason git_diff_driver_builtin
is using a regex to determine the diff driver to use, and that regex is failing for whatever reason.
Even weirder, it doesn’t fail for every commit, but a lot of them. Here are some notes I took while trying to figure this out:
master (as of this writing): f8f7adce9fc50a11a764d57815602dcb818d1816
Last good commit on master (found by using git reset --hard HEAD~1): b48dfd86c90cae3f98dca01101b7e298c0192d16
First bad commit on master: ad2d77760434e1650c186c71fa04a8fdbd77266c
Last bad commit on master: f5236a776f31de2654bca9001aa74ef9fe0819d8
First good commit before the above bad commit: 566fdaf611f44724120412a43132c07b020fc4f1
Useful command for going back that far: git reset --hard HEAD~35
Reasons why I’m filing a nodegit issue instead of a libgit2 issue:
- I’m only seeing this under electron, so maybe nodegit is doing something funny when compiled for electron?
- Perhaps we could patch nodegit so that we can specify the diff driver to use for
diff.patches()
? Otherwise I’m not sure how to work around this.
Let me know if there’s any other information I can provide, or generally if anyone has any ideas. I can follow the libgit2 code, but I haven’t been able to figure out understand why a regex would be (seemly randomly) failing.
About this issue
- Original URL
- State: open
- Created 8 years ago
- Reactions: 6
- Comments: 20 (2 by maintainers)
Commits related to this issue
- [dev] Resolves nodegit/nodegit#1097 — committed to JeroenED/libpairtwo by JeroenED 5 years ago
- [dev] Resolves nodegit/nodegit#1097 — committed to JeroenED/libpairtwo by JeroenED 5 years ago
- [dev] Resolves nodegit/nodegit#1097 — committed to JeroenED/libpairtwo by JeroenED 5 years ago
Same issue here with GitKraken 3.3.0 (installed via the Debian package) on Ubuntu 17.10 (french flavor).
The workaround provided by @arthurp fixed the issue: replaced
Exec=/usr/share/gitkraken/gitkraken %U
byExec=env LC_ALL=C /usr/share/gitkraken/gitkraken %U
in the/usr/share/applications/gitkraken.desktop
file.As a work around for English speakers, you can set “LC_ALL=C” when you start the application. You can set your launcher to launch
/bin/bash -c "LC_ALL=C /usr/share/gitkraken/gitkraken %U"
(using gitkraken as an example) instead of the default (this is easy to do using the Alacarte, Gnome’s main menu editor). You could also create a launcher script that does something similar.Still there in 6.0.0
Ubuntu 18.04, GitKraken 4.0.2 (installed Gitkraken through Ubuntu’s software manager)
using https://github.com/nodegit/nodegit/issues/1097#issuecomment-348749117, I added
LC_ALL=C
to theExec=env
line, in the file/var/lib/snapd/desktop/applications/gitkraken_gitkraken.desktop
It looks like the regex library in Libgit2 has been updated to accomodate collation issues. https://github.com/libgit2/libgit2/pull/4935
I’m hoping to get https://github.com/nodegit/nodegit/pull/1690 into the next 0.25.0 alpha version. And eventually 0.25.0.
same issue, will someone fix it?
I’m facing the same issue as well in GitKraken. Running it with
LC_ALL=C
works, but not otherwise.My locale is mostly US English. When I installed Xubuntu I set my timezone to Kiev, but picked English everywhere that was asked and have English (United States) set as the regional format. The only thing non-english is Russian as an additional (secondary) keyboard layout. The problem occurs only in one repository for most of the files in the repository (except brand-new ones). The output of
locale
is:env | grep LC
shows:The workaround in #1097 (comment) worked for me on Ubuntu 18.04. I can also verify that if you do not have root access to modify the launcher in
/usr/share/applications
, you can just copy the file to you home folder and edit it there. GNOME will prefer the launcher in.local/share/applications
over the one in/usr/share/applications
.I have found what triggers this error on my machine. It is the following line in my ~/.gitattributes file.
Removing this line resolves the error that GitKraken gave.
My ~/.gitconfig contains:
I still don’t understand why this triggers the error but at least I now have a workaround.
I can third this issue in GitKraken. It seems that the root cause is when a commit author has a name with UTF-8 characters in it. Only one repo I have exhibits the issue, and sure enough it is the only one with an author who has UTF-8 character name.
I have the same output of locale with no LC envs defined.
Here’s the best I could trace the regcomp function usage: https://github.com/libgit2/libgit2/blob/20302aa43738a972e0bd2e2ee6ae479208427b31/src/diff_driver.c#L213
It would seem that regcomp cannot handle UTF-8?