ghprb: Fail to checkout PR when a push -f is done to the PR while is being already built

This issue comes from https://issues.jenkins-ci.org/browse/JENKINS-22537

Please consult that issue for details. Here is the summary of the latest research.

The setup is a job using a configuration matrix (I’m not sure if this is strictly necessary to reproduce the problem though) with only one variable and 2 configurations (which is only the Ubuntu version to do the build). Then the ghprb plugin is activated and configured to trigger a build by the GH hook.

Steps to reproduce:

  1. Create a dummy project in GitHub
  2. Add a build script that is only a sleep 60
  3. Configure the Jenkins job to build with the ghprb plugin
  4. Create a PR
  5. Change something in the PR (you can do a git commit --amend and just change the commit message)
  6. While jenkins is building the job (sleeping), do a git push -f to your PR branch
  7. Wait for Jenkins to finish the current build and start the new one
  8. You should see the error now

The error (this is the console output for one of the configurations in the matrix, the other have the same error) is:

Started by upstream project "Test" build number 40
originally caused by:
 GitHub pull request #76 of commit 8e9a1ab00c82edf5f37b400f0a98b6476ba63b97 automatically merged.
Building in workspace .../jenkins/jobs/Test/workspace/Ubuntu/trusty
Cloning the remote Git repository
Cloning repository git@github.com:someuser/somerepo.git
Fetching upstream changes from git@github.com:someuser/somerepo.git
using GIT_SSH to set credentials GitHub leandro-lucarella-sociomantic
Fetching upstream changes from git@github.com:someuser/somerepo.git
using GIT_SSH to set credentials GitHub leandro-lucarella-sociomantic
Checking out Revision 0c2e9e245bd8d7329c3d8b91013d806f4b112687 (detached)
FATAL: Could not checkout null with start point 0c2e9e245bd8d7329c3d8b91013d806f4b112687
hudson.plugins.git.GitException: Could not checkout null with start point 0c2e9e245bd8d7329c3d8b91013d806f4b112687
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1448)
    at hudson.plugins.git.GitSCM.checkout(GitSCM.java:896)
    at hudson.model.AbstractProject.checkout(AbstractProject.java:1411)
    at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:652)
    at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:561)
    at hudson.model.Run.execute(Run.java:1665)
    at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
    at hudson.model.ResourceController.execute(ResourceController.java:88)
    at hudson.model.Executor.run(Executor.java:246)
Caused by: hudson.plugins.git.GitException: Command "git checkout -f 0c2e9e245bd8d7329c3d8b91013d806f4b112687" returned status code 128:
stdout: 
stderr: fatal: reference is not a tree: 0c2e9e245bd8d7329c3d8b91013d806f4b112687

    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1276)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1253)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1249)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1065)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1075)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1443)
    ... 9 more

This is the console output for the whole job (not one configuration in particular):

GitHub pull request #76 of commit 8e9a1ab00c82edf5f37b400f0a98b6476ba63b97 automatically merged.
ode environment variables.
Building in workspace .../jenkins/jobs/Test/workspace
Fetching changes from the remote Git repository
Fetching upstream changes from git@github.com:someuser/somerepo.git
using GIT_SSH to set credentials GitHub leandro-lucarella-sociomantic
Checking out Revision 142cbdfe90a5e7cf8e2ba44eb6c16b4b93bb2b93 (detached)
Cleaning workspace
Resetting working tree
Triggering trusty
Triggering saucy
trusty completed with result FAILURE
saucy completed with result FAILURE
Finished: FAILURE

About this issue

  • Original URL
  • State: open
  • Created 10 years ago
  • Comments: 60 (4 by maintainers)

Most upvoted comments

The “easy” fix is to make sure matrix slaves build the merge ref, not the sha1.

We did exactly that to resolve this issue, which we were hitting probably a third(!) of the time. Patch against git (not ghprb) plugin 2.4.0:

diff --git a/src/main/java/hudson/plugins/git/GitSCM.java b/src/main/java/hudson/plugins/git/GitSCM.java
index aca465d..bbddc53 100644
--- a/src/main/java/hudson/plugins/git/GitSCM.java
+++ b/src/main/java/hudson/plugins/git/GitSCM.java
@@ -915,6 +915,7 @@ public class GitSCM extends GitSCMBackwardCompatibility {


         // every MatrixRun should build the same marked commit ID
+        /*
         if (build instanceof MatrixRun) {
             MatrixBuild parentBuild = ((MatrixRun) build).getParentBuild();
             if (parentBuild != null) {
@@ -926,6 +927,7 @@ public class GitSCM extends GitSCMBackwardCompatibility {
                 }
             }
         }
+        */

         // parameter forcing the commit ID to build
         if (candidates.isEmpty() ) {

The patch just bluntly disables the git plugin’s special case logic for matrix builds, so each child will use pr/<id>/merge. It introduces a risk of child builds using inconsistent commits, but in practice hasn’t caused us any problems.

I posted this also on JENKINS-26290, where hopefully a better fix can be found. I suspect an optimal solution would require changes to both plugins.

Just to clarify, it sounds like the underlying issue with matrix builds is that GitHub force pushes to the pr/<id>/merge ref when a new commit is added to the pull request. As a result, it’s possible that the merge SHA1 that was resolved by ghprb will no longer be available for any git repos that hadn’t fetched the original merge commit (e.g., a matrix job on a slave, etc).

Further expanding the code snippet from ghprb in the earlier commit, we have:

        final String commitSha = cause.isMerged() ? "origin/pr/" + cause.getPullID() + "/merge" : cause.getCommit();
        values.add(new StringParameterValue("sha1", commitSha));
        values.add(new StringParameterValue("ghprbActualCommit", cause.getCommit()));

This makes it look like you could avoid this problem by using ${ghprbActualCommit} instead of ${sha1}, since the former will be an actual commit in the pull request branch as opposed to an ephemeral commit that could be force-pushed over. Note that this could have other side effects in that the pull request branch could be build, but once merged into the target branch, the build could fail (since you’re not building the merged branches). I think the only way to safely do a merged build would be to use the pull request commit and have ghprb do a local merge from the target branch. Note that for matrix builds to be consistent it would need to pass a SHA1 for the target branch as well so that any matrix jobs doing the local merge would use the exact same commits.

It’s a problem only with matrix jobs.

Here’s what happens:

  • “master” job of matrix gets executed, which determines the sha1 of the target commit (i.e. it turns pr/XXX/merge into abcdef1)
  • the matrix slaves get scheduled to build this sha1sum
  • the matrix slaves eventually execute the build

The race condition here is that because the PR merge commit is ephemeral (i.e. it gets recreated every time there’s a commit to either the PR, or the repo the PR targets). If there’s any commit between when the matrix master running and the matrix slaves running (e.g. if one of your matrix axis slaves is overloaded so the scheduling takes some time), the problem occurs because the commitid that the matrix master saw no longer exists by the time the matrix slaves fetch the repo.

The “easy” fix is to make sure matrix slaves build the merge ref, not the sha1. Or, as we did, don’t use matrix jobs at all.