build: Build controller handling of service account can lead to failing build runs and other artifacts

In a build run spec, there is a section about the service account. This section can be

set to a specific name and generate = false to make use of the specified service account
set to generate = true to generate a service account specific to the build run
omitted to get the default behavior which is to use the pipeline service account or - if that does not exist - the default service account

When the generated service account is not used, our logic in generate task run will add the secrets referenced in the build (for example for the source repository and the target container registry) to the specified service account if the secret is not already contained.

If the user now deletes one of those secrets, the service account is broken and our controller will also not repair it. If the user now tries to create a build run that uses such a service account, then the build run will fail with this reason: failed to create task run pod "new-build-run-kwkl5-j9b27": translating TaskSpec to Pod: secrets "something" not found. Maybe invalid TaskSpec As a service account can be used for other means as well (especially if scenario 3 applies and a fallback to default happens), those other things might also fail (job, deployment) (I should mention that I did not try how they behave with such a service account).

Here are some options that I see:

(A) Watch secret deletions and remove references from the service account(s). This would probably be possible if we would only support either a hard-coded service account or a generated one. But given the user can use a named service account for his build runs, this could mean that we need to manage a lot service accounts.

(B) At the time we assign a named service account to a build run, we check whether all secrets really exist and if one does not exist we fail the build run with an error message that is nicer than the one coming from Tekton.

(D) We do nothing and only document the current behavior.

In addition to those options, we may also consider a configuration setting for the controller that allows to force a build run to use a generated service account by ignoring the intent of the user specified in the build run.

About this issue

Original URL
State: open
Created 4 years ago
Comments: 18 (17 by maintainers)

Most upvoted comments

I tried to get started with Shipwright since yesterday but I’m stuck with what I assume to be the issue described here.

k logs myapp123-lrkrd-pod-cprfn error: a container name must be specified for pod myapp123-lrkrd-pod-cprfn, choose one of: [step-source-default step-build-and-push] or one of the init containers: [place-tools working-dir-initializer]

Source code and registry is on a private self hosted Gitlab instance. Using kaniko strategy. I can confirm that using a public git repo works as expected (except for the incorrect timestamp on the image, it says created 40yrs ago 😃 Now, have you documented anywhere how to get passed the above error? Thanks! Great project!

Hi @davidberglund, thank you for the feedback. Your question is not related to this issues. The best place for questions is our slack channel, #shipwright, in the Kubernetes slack, or to open a new issue.

Regarding your question: BuildRun pods consist of multiple containers. As such, you need to run kubectl logs using either the -c argument and the container name, or using --all-containers. The BuildRun’s status should also guide you specifically to the command for the failed step. Information about how to authenticate to a private Git repository can be found here: https://github.com/shipwright-io/build/blob/main/docs/development/authentication.md#authentication-for-git.

SaschaSchwarze0 on Sep 28, 2021