ghprb: Fail to checkout PR when a push -f is done to the PR while is being already built
This issue comes from https://issues.jenkins-ci.org/browse/JENKINS-22537
Please consult that issue for details. Here is the summary of the latest research.
The setup is a job using a configuration matrix (I’m not sure if this is strictly necessary to reproduce the problem though) with only one variable and 2 configurations (which is only the Ubuntu version to do the build). Then the ghprb plugin is activated and configured to trigger a build by the GH hook.
Steps to reproduce:
- Create a dummy project in GitHub
- Add a build script that is only a
sleep 60 - Configure the Jenkins job to build with the ghprb plugin
- Create a PR
- Change something in the PR (you can do a
git commit --amendand just change the commit message) - While jenkins is building the job (sleeping), do a
git push -fto your PR branch - Wait for Jenkins to finish the current build and start the new one
- You should see the error now
The error (this is the console output for one of the configurations in the matrix, the other have the same error) is:
Started by upstream project "Test" build number 40
originally caused by:
GitHub pull request #76 of commit 8e9a1ab00c82edf5f37b400f0a98b6476ba63b97 automatically merged.
Building in workspace .../jenkins/jobs/Test/workspace/Ubuntu/trusty
Cloning the remote Git repository
Cloning repository git@github.com:someuser/somerepo.git
Fetching upstream changes from git@github.com:someuser/somerepo.git
using GIT_SSH to set credentials GitHub leandro-lucarella-sociomantic
Fetching upstream changes from git@github.com:someuser/somerepo.git
using GIT_SSH to set credentials GitHub leandro-lucarella-sociomantic
Checking out Revision 0c2e9e245bd8d7329c3d8b91013d806f4b112687 (detached)
FATAL: Could not checkout null with start point 0c2e9e245bd8d7329c3d8b91013d806f4b112687
hudson.plugins.git.GitException: Could not checkout null with start point 0c2e9e245bd8d7329c3d8b91013d806f4b112687
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1448)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:896)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1411)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:652)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:561)
at hudson.model.Run.execute(Run.java:1665)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:246)
Caused by: hudson.plugins.git.GitException: Command "git checkout -f 0c2e9e245bd8d7329c3d8b91013d806f4b112687" returned status code 128:
stdout:
stderr: fatal: reference is not a tree: 0c2e9e245bd8d7329c3d8b91013d806f4b112687
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1276)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1253)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1249)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1065)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1075)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1443)
... 9 more
This is the console output for the whole job (not one configuration in particular):
GitHub pull request #76 of commit 8e9a1ab00c82edf5f37b400f0a98b6476ba63b97 automatically merged.
ode environment variables.
Building in workspace .../jenkins/jobs/Test/workspace
Fetching changes from the remote Git repository
Fetching upstream changes from git@github.com:someuser/somerepo.git
using GIT_SSH to set credentials GitHub leandro-lucarella-sociomantic
Checking out Revision 142cbdfe90a5e7cf8e2ba44eb6c16b4b93bb2b93 (detached)
Cleaning workspace
Resetting working tree
Triggering trusty
Triggering saucy
trusty completed with result FAILURE
saucy completed with result FAILURE
Finished: FAILURE
About this issue
- Original URL
- State: open
- Created 10 years ago
- Comments: 60 (4 by maintainers)
We did exactly that to resolve this issue, which we were hitting probably a third(!) of the time. Patch against git (not ghprb) plugin 2.4.0:
The patch just bluntly disables the git plugin’s special case logic for matrix builds, so each child will use
pr/<id>/merge. It introduces a risk of child builds using inconsistent commits, but in practice hasn’t caused us any problems.I posted this also on JENKINS-26290, where hopefully a better fix can be found. I suspect an optimal solution would require changes to both plugins.
Just to clarify, it sounds like the underlying issue with matrix builds is that GitHub force pushes to the
pr/<id>/mergeref when a new commit is added to the pull request. As a result, it’s possible that the merge SHA1 that was resolved byghprbwill no longer be available for any git repos that hadn’t fetched the original merge commit (e.g., a matrix job on a slave, etc).Further expanding the code snippet from
ghprbin the earlier commit, we have:This makes it look like you could avoid this problem by using
${ghprbActualCommit}instead of${sha1}, since the former will be an actual commit in the pull request branch as opposed to an ephemeral commit that could be force-pushed over. Note that this could have other side effects in that the pull request branch could be build, but once merged into the target branch, the build could fail (since you’re not building the merged branches). I think the only way to safely do a merged build would be to use the pull request commit and haveghprbdo a local merge from the target branch. Note that for matrix builds to be consistent it would need to pass a SHA1 for the target branch as well so that any matrix jobs doing the local merge would use the exact same commits.It’s a problem only with matrix jobs.
Here’s what happens:
The race condition here is that because the PR merge commit is ephemeral (i.e. it gets recreated every time there’s a commit to either the PR, or the repo the PR targets). If there’s any commit between when the matrix master running and the matrix slaves running (e.g. if one of your matrix axis slaves is overloaded so the scheduling takes some time), the problem occurs because the commitid that the matrix master saw no longer exists by the time the matrix slaves fetch the repo.
The “easy” fix is to make sure matrix slaves build the merge ref, not the sha1. Or, as we did, don’t use matrix jobs at all.
reported in Jenkins Jira as well - https://issues.jenkins-ci.org/browse/JENKINS-26748