nodegit: fileHistoryWalk does not return all commits

When I compare the history returned by fileHistoryWalk to git log console output, there are a lot of commits missing from the fileHistoryWalk commits.

For example, using the atom editor repo and source:

cd /tmp
git clone https://github.com/atom/atom.git
cd atom
git log src/text-editor.coffee

…then, using a slightly modified examples/walk-history-for-file.js (see below) and compare. The output from the example says there are 129 commits to text-editor.coffee but git log src/text-editor.coffee shows 481 commits to that file.

modified example (changed repo, test tile path and added count output:

var nodegit = require("../"),
    path = require("path"),
    historyFile = "src/text-editor.coffee",
    walker,
    historyCommits = [],
    commit,
    repo;

// This code walks the history of the master branch and prints results
// that look very similar to calling `git log` from the command line

function compileHistory(resultingArrayOfCommits) {
  var lastSha;
  if (historyCommits.length > 0) {
    lastSha = historyCommits[historyCommits.length - 1].commit.sha();
    if (
      resultingArrayOfCommits.length == 1 &&
      resultingArrayOfCommits[0].commit.sha() == lastSha
    ) {
      return;
    }
  }

  resultingArrayOfCommits.forEach(function(entry) {
    historyCommits.push(entry);
  });

  lastSha = historyCommits[historyCommits.length - 1].commit.sha();

  walker = repo.createRevWalk();
  walker.push(lastSha);
  walker.sorting(nodegit.Revwalk.SORT.TIME);

  return walker.fileHistoryWalk(historyFile, 500)
    .then(compileHistory);
}

//nodegit.Repository.open(path.resolve(__dirname, "../.git"))
nodegit.Repository.open("/tmp/atom/.git")
  .then(function(r) {
    repo = r;
    return repo.getMasterCommit();
  })
  .then(function(firstCommitOnMaster){
    // History returns an event.
    walker = repo.createRevWalk();
    walker.push(firstCommitOnMaster.sha());
    walker.sorting(nodegit.Revwalk.SORT.Time);

    return walker.fileHistoryWalk(historyFile, 500);
  })
  .then(compileHistory)
  .then(function() {
    historyCommits.forEach(function(entry) {
      commit = entry.commit;
      console.log("commit " + commit.sha());
      console.log("Author:", commit.author().name() +
        " <" + commit.author().email() + ">");
      console.log("Date:", commit.date());
      console.log("\n    " + commit.message());
    });
    console.log("\n\n" + historyCommits.length + " total commits");
  })
  .done();

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 4
  • Comments: 15 (6 by maintainers)

Commits related to this issue

Most upvoted comments

fileHistoryWalk(fileName, 500) seems to behave like just a regular getCommits(500) that filters by commits that have the fileName, instead of returning (up to) 500 entries that involve the fileName.

revwalk.fileHistoryWalk(fileName, 1), instead of returning the nearest commit that affects fileName, will always result in [] unless the file was modified in the commit that was pushed in the revwalk.

This makes fileHistoryWalk fairly useless IMO 😦

The inability to interrupt the walk in revWalk#walk and/or Commit#history()#start also makes it very hard to only look at the most recent commits of a specific file in very large repositories, but that is a different issue.

I’m puzzled how difficult it is to do git log -- path/to/file with this library.

As mentioned fileHistoryWalk is completely useless as it doesn’t seem to go through all commits in the history, no matter how large number you give it.

  import type {
      Revwalk,
      Repository,
      Commit
  } from 'nodegit';

  import Git from 'nodegit'; /// doc: https://github.com/nodegit/nodegit

  async function getCommit(_repo: Repository | string, count?: number): Promise<Commit[]>{
        const repo: Repository = _repo instanceof Git.Repository ? _repo : await Git.Repository.open(_repo);
        const lastCommit: Commit = await repo.getHeadCommit();

        const revWalk: Revwalk = repo.createRevWalk();
        revWalk.sorting(Git.Revwalk.SORT.TIME);
        revWalk.push(lastCommit.id());

        if (typeof count === 'undefined') return revWalk.getCommitsUntil((_commit: Commit) => true); // return all commit
        return revWalk.getCommits(count);
    }

nodegit@0.27.0

Also for the record, fileHistoryWalk still does not return all commits; it should now have a key on the array return reachedEndOfHistory. The intent is that you can request 30k to 50k commits at a time until you hit the end of history (init commit). It seems that the libgit2 revwalk might be a little slower than git core, as just the revwalk alone on the linux repository takes a considerable amount of time in comparsion to standard git core. So it is advisable in large applications to design for that slow down accordingly.