go-git: `object not found` when trying to pull a repository cloned with `Depth: 1`

Hi,

When you clone a repository with Depth: 1, you cannot pull it. It’ll error with object not found.

Cloning:

_, err := git.PlainClone(repoPath, false, &git.CloneOptions{
	URL: gitURL,
	Depth: 1,
	})

Pulling:

// Open repo
r, err := git.PlainOpen(repo.Path)
if err != nil {
	panic(fmt.Errorf("error while opening git repo (%s) %s", repo.Name, err))
}
// Get worktree
tree, err := r.Worktree()
if err != nil {
	panic(fmt.Errorf("error while opening git repo (%s) %s", repo.Name, err))
}

// Pull
err = tree.Pull(&git.PullOptions{})
// If repo is already up to date, do nothing
if err == git.NoErrAlreadyUpToDate {
	// do nothing
} else if err != nil {
	panic(fmt.Errorf("error while pulling git repo (%s) %s", repo.Name, err))
}

It’s critical that I can pull with Depth: 1, because the repository I’m pulling might be very large (hundreds of thousands of commits). Can somebody take a look at this?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 12
  • Comments: 31 (5 by maintainers)

Commits related to this issue

Most upvoted comments

A colleague and I have solved this issue. The iterator used in the fastforward check wasn’t prepared for the case when a commit might have a missing parent (as in shallow clones). Handling that case solved the problem of pulling on shallow clones.

TL;DR - Use file://path/to/repo and you should be good

I ran into similar problems recently. Initially I worked around the shallow fetch issue by doing a full fetch. The fetch was actually fast, but then the push itself was taking 10 seconds. Weird… So I ended up spending more time trying to understand what was going wrong.

Unfortunately I’m not familiar enough with the code base to fix the actual issue, but hopefully my brain dump helps for people to locate the issue, or at least provides useful clues how to work around it:

Without further ado, here are my findings:

For local urls, the code in revlist.go // ObjectsWithStorageForIgnores constructs a list of all objects to be pushed. Internally, it constructs a list of objects to be ignored first. This ignore list also includes the commits which as a result of the shallow fetch don’t have a parent present:

	ignore, err := objects(ignoreStore, ignore, nil, true)
	if err != nil {
		return nil, err
	}

	return objects(s, objs, ignore, false)

The first issue I ran into when doing a shallow fetch and then a push is I think what you run into as well. The first objects() call is unable to locate the objects to be ignored, because the path to the objects it constructed did not include the .git folder. That’s where I realized I provided the local repository path without “.git”. Formally this I guess is a mistake on my side since documentation I could find on how to clone a local repository says you should point to the “.git” folder. However, the git CLI doesn’t care it works in either case. And also go-git seems to largely work and just fail in this unexpected way. So I think an improvement here would be to either enforce the path in the expected format, or fix the logic to work either way.

However when including “.git” in the path url, I ran into the next issue: The first call to objects() in the code above, the one to obtain all objects to ignore, was taking 5 seconds to complete. It returned a list of 32K objects which I presume is the full repository. Well that’s twice as fast as the non-shallow fetch/push, but still horribly slow to push just a single object.

So then I realized I could bypass the “local url” logic by providing the repository in the file://path/to/repo format (Including .git is no longer required). The push now takes less than .1 second.

@AriehSchneier Thank you for responding! I really appreciate you helping us get to the root of this issue.

To be clear I dont think this is related to unshallowing, its a cloned repo with a depth of 1 and a pull with a depth of 1 returning “object not found” when a change is made in the repo and the pull is initiated.

created a repo that should reproduce the issue end to end fairly easily

https://github.com/stvnksslr/sandbox-go-git

build the project ./sandbox-go-git https://github.com/stvnksslr/sandbox-git.git

substitute the repository for one you are able to make changes too

  1. run the app to checkout the repo ./sandbox-go-git <repo>
  2. run the app to pull (should work)
  3. make a change in the other repo
  4. run the app to pull again “object not found” will be returned as an error

I’m getting empty git-upload-pack given for the same scenario, with v5.4.2

Got really excited about https://github.com/go-git/go-git/pull/932, updated dependency to use this…but still encountering the git pull problem.

Any workarounds and also ideas on where this is happening?

I think I’ve actually found a better solution to this problem. It seems that the concept of haves v.s wants is not balanced in that shallowness is only checked in the wants side of the checks but not in the haves side.

This iteration should check if c.Hash is a shallow reference, something like this:

@@ -691,6 +691,15 @@ func getHavesFromRef(
 	toVisit := maxHavesToVisitPerRef
 	return walker.ForEach(func(c *object.Commit) error {
 		haves[c.Hash] = true
+
+		if s, _ := s.Shallow(); len(s) > 0 {
+			for _, sh := range s {
+				if sh == c.Hash {
+					return storer.ErrStop
+				}
+			}
+		}
+
 		toVisit--
 		// If toVisit starts out at 0 (indicating there is no
 		// max), then it will be negative here and we won't stop

@wyarde No, I am on a MacOS system.

@wyarde thank you so much for posting your findings! I’m going to try this when I’m home- thank you so much