build: Git source validation will be stuck in some cases which causes a pile up of unregistered build objects
use case
During reconcile, our build controller will validate remote git source. And we are using git ls-remote to check if remote URL exists or not. About this List, it doesn’t have a timeout parameter can be set. We can see there has a newUploadPackSession func is initializing a new session. And the new session is generated by a DefaultClient. The default client is nil. That means the timeout in this client is zero, also means no timeout. Then AdvertisedReferences func will do http request besed on this new session. Because the client doesn’t have timeout, this causes in some cses, such as github or gitlab outage, http request will be stuck.
Above situation will cause our build controller waits all the time. With the increasing number of builds, there will have many builds queue and wait for being reconciled. If remote URL outage all the time, then this will causes new created builds will not be registered. This situation had happened internally.
Reproduce issue
Create multiple builds with this source which is provided by @HeavyWombat in paraller. Then check their status. Then create a new build. The new build should not be registered.
Expected behavior
Git validation should have a timeout. Within this time, if sourceURL checks doesn’t finish, then should timeout and build controller should return some meaningful message, such as checkSourceURLTimeout and update build status so that controller can handle other build objects.
cc/ @zhangtbj
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 18 (18 by maintainers)
I submit a PR:https://github.com/shipwright-io/build/pull/672 to disable sourceURL validation by default, pls take a look, thanks!
@qu1queee Yes, we can close it, now. Also I will submit a PR to
go-gitrepo in order to add timeout context forgit ls-remotecommand. Once this PR is ready, I think we can fix this issue from the root cause.@xiujuan95 can we close this issue?
fwiw, when the git validation first dropped, I believe I was the one who originally insisted on it minimally having an on/off switch. At the time, I chose to be “un-opinionated” on what the default should be. But perhaps I should have been more opinionated 😃 (my personal preference would have been off by default), and I am certainly +1 on switching the default to off now.
I agree with Jason,
We think about this problem recently and there are three solutions:
go-gitcommunity accept the PR https://github.com/go-git/go-git/pull/246 or we provide a timeout related fix PR, but it will take more time (maybe more than 2 weeks)The first two solutions seem cannot solve the problem soon, so I also suggest to disable the git validation by default first. and enable it once the problem is solved. It will make us safer to avoid the controller outage.
But also would like hear other comments or discuss next Monday together.