build: Git source validation will be stuck in some cases which causes a pile up of unregistered build objects

use case

During reconcile, our build controller will validate remote git source. And we are using git ls-remote to check if remote URL exists or not. About this List, it doesn’t have a timeout parameter can be set. We can see there has a newUploadPackSession func is initializing a new session. And the new session is generated by a DefaultClient. The default client is nil. That means the timeout in this client is zero, also means no timeout. Then AdvertisedReferences func will do http request besed on this new session. Because the client doesn’t have timeout, this causes in some cses, such as github or gitlab outage, http request will be stuck.

Above situation will cause our build controller waits all the time. With the increasing number of builds, there will have many builds queue and wait for being reconciled. If remote URL outage all the time, then this will causes new created builds will not be registered. This situation had happened internally.

Reproduce issue

Create multiple builds with this source which is provided by @HeavyWombat in paraller. Then check their status. Then create a new build. The new build should not be registered.

Expected behavior

Git validation should have a timeout. Within this time, if sourceURL checks doesn’t finish, then should timeout and build controller should return some meaningful message, such as checkSourceURLTimeout and update build status so that controller can handle other build objects.

cc/ @zhangtbj

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 18 (18 by maintainers)

Most upvoted comments

I submit a PR:https://github.com/shipwright-io/build/pull/672 to disable sourceURL validation by default, pls take a look, thanks!

@qu1queee Yes, we can close it, now. Also I will submit a PR to go-git repo in order to add timeout context for git ls-remote command. Once this PR is ready, I think we can fix this issue from the root cause.

@xiujuan95 can we close this issue?

fwiw, when the git validation first dropped, I believe I was the one who originally insisted on it minimally having an on/off switch. At the time, I chose to be “un-opinionated” on what the default should be. But perhaps I should have been more opinionated 😃 (my personal preference would have been off by default), and I am certainly +1 on switching the default to off now.

I agree with Jason,

We think about this problem recently and there are three solutions:

  • Push go-git community accept the PR https://github.com/go-git/go-git/pull/246 or we provide a timeout related fix PR, but it will take more time (maybe more than 2 weeks)
  • Solve in the Shipwright side, change to use a patched go-git remote pkg or add a goroutine with timeout like Jason mentioned above, but it is not good and we have to rollback once we have the official fix
  • Switch the git validation as disable by default

The first two solutions seem cannot solve the problem soon, so I also suggest to disable the git validation by default first. and enable it once the problem is solved. It will make us safer to avoid the controller outage.

But also would like hear other comments or discuss next Monday together.