pipeline: TaskRun fails during initialization when disable-home-env-overwrite=true

This is closely related to the on-going Tekton $HOME issue (https://github.com/tektoncd/pipeline/issues/2013#issuecomment-585908031). I am testing disable-home-env-overwrite before it gets flipped.

This comment says

With this new flag Tekton will no longer interfere with HOME - it will be whatever you expect it to be when the container runs in a Pod.

Previously $HOME would have been set to /tekton/home but now it won’t be. So I would expect $HOME/.docker/config.json to be written to /root/.docker/config.json if the user is root and the image doesn’t specify its own HOME.

I don’t think this is the case. I am testing gcr.io/cloud-builders/gradle, but Tekton fails as it tries to create a directory /.docker.

"level":"fatal",
"ts":1583431818.4164164,
"caller":"creds-init/main.go:41",
"msg":"Error initializing credentials: mkdir /.docker: permission denied",
"stacktrace":
main.main
    github.com/tektoncd/pipeline/cmd/creds-init/main.go:41
runtime.main
    runtime/proc.go:203

Note the “permission denied” error is not the issue here. The issue is that it is /.docker instead of /root/.docker.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 35 (17 by maintainers)

Most upvoted comments

I’ve gone with the approach of placing the creds in a fixed location (/tekton/home) and made a PR here: https://github.com/tektoncd/pipeline/pull/2180

So, without securityContext in TaskRun, it can actually see the right $HOME value. I think something else is also involved to make this difference.

check-dirs Step always sees the correct $HOME value, /workspace. We set this explicitly on the Step. In some of our runs above, though, the Task dies before it gets to check-dirs. The log output we see in these cases is only for git-source-xxxx and create-dir-image-xxxx. They fail writing to /.

(If you also remove securityContext from Task, they will be copied successfully into /workspace).

One small nit here: we don’t set the securityContext on the Task but on the Step. So check-dirs, the step container, receives the securityContext. But creds-init (an injected initContainer) and pipeline resource injected containers do not receive that securityContext. If you remove the securityContext from the check-dirs Step then the TaskRun’s securityContext is applied to all containers equally.

This is all immeasurably confusing. I’m going to try to illustrate the different scenarios here:


First scenario:

disable-home-override: “true” No TaskRun securityContext: UID=root No check-dirs securityContext: UID=root

Order of operations:

  1. creds-init initContainer writes credentials to /tekton/creds. Those files all get root ownership.
  2. PipelineResource containers, run as root, copy creds from /tekton/creds to /.
  3. Git PipelineResource, run as root, writes to /.gitconfig successfully.
  4. check-dirs container, runs as root, read creds from /tekton/creds, write creds to /workspace.

Second scenario:

disable-home-override: “true” No TaskRun securityContext: UID=root check-dirs has securityContext: UID=1234

Order of operations:

  1. creds-init initContainer writes credentials to /tekton/creds. Those files all get root ownership.
  2. PipelineResource containers, run as root, copy creds from /tekton/creds to /.
  3. Git PipelineResource, run as root, writes to /.gitconfig successfully.
  4. check-dirs container, runs as 1234, dies copying creds from /tekton/creds to /workspace because they’re owned by root. Error: [check-dirs] 2020/05/01 14:53:01 unsuccessful cred copy: ".ssh" from "/tekton/creds" to "/workspace": unable to open source: open /tekton/creds/.ssh/known_hosts: permission denied

Third scenario:

disable-home-override: “true” TaskRun has securityContext: UID=1111 check-dirs has securityContext: UID=1234

Order of operations:

  1. creds-init initContainer writes credentials to /tekton/creds. Those files all get 1111 ownership.
  2. PipelineResource containers, run as 1111, fail to copy creds from /tekton/creds to /. Messages: unsuccessful cred copy: ".ssh" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.ssh: permission denied
  3. Git PipelineResource, run as 1111, fatal error: dies writing to /.gitconfig. Error: {"level":"error","ts":1588180626.7862072,"caller":"git/git.go:41","msg":"Error running git [config --global http.sslVerify true]: exit status 255\nerror: could not lock config file //.gitconfig: Permission denied\n","stacktrace":"github.com/tektoncd/pipeline/pkg/git.run\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:41\ngithub.com/tektoncd/pipeline/pkg/git.Fetch\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:82\nmain.main\n\tgithub.com/tektoncd/pipeline/cmd/git-init/main.go:53\nruntime.main\n\truntime/proc.go:203"}

Here check-dirs never runs. Therefore no mention of /workspace.


This is a really confusing dance and there’s quite a bit of work to do to get all of Tekton’s movements in lock-step.

Excellent, I’ve been able to reproduce the problem exactly.

create-dir-image-XXXXX is injected into a Task when either the GCS PipelineResource is used or Tekton decides it needs to create an extra directory during PipelineResource linking. It doesn’t have a HOME when the home override flag is “true” and it doesn’t run as root when securityContext sets non-zero user ID. So it reports errors when the entrypoint tries to copy credentials out of /tekton/creds into /.

git-source-XXXXX is placed into a Task when the Git PipelineResource is used. It shares the same problems as above and also adds another wrinkle: it can’t lock the $HOME/.gitconfig file for setting configuration options. This is again because $HOME isn’t set, it defaults to /, and it’s running as a non-zero user ID. Unlike create-dir-image- this is a fatal error for the Git PipelineResource and the Task dies here.

Ultimately the errors with these two Steps are happening because PipelineResources don’t have a HOME set and they’re trying to write to / as a non-root user due to the securityContext.

So summarizing the various problems that have been discovered here:

  1. The Git PipelineResource needs to be able to lock and write files in $HOME. Specifically $HOME/.gitconfig.

  2. Credentials need to be written to /tekton/creds using the UID of the currently running Step. creds-init can’t do this on its own because UID can differ from container to container.

And the likely solutions seem to me:

  • PipelineResources need to have their HOME set somewhere they can always write regardless of UID. I’m thinking /tekton/home since it’s always mounted (even when the override flag is true) and it’s always world-writeable since it’s an emptyDir.

  • creds-init probably needs to go away completely and have its logic moved into the entrypointer. This is the only solution I can think of that will allow UID to be random, creds copied out of secret volumes with the correct file permissions, and HOME to be discovered at runtime.

Ideally the entrypointer could copy credentials straight out of secret volumes and into wherever they think $HOME is. Unfortunately an annoying extra problem that I’ve brought upon myself is that I’ve introduced $(credentials.path), which I’ve documented as pointing to a single location. So the entrypointer is going to need to copy the creds to /tekton/creds as well as copying them to wherever $HOME is.

I’ll create issues for each of these problems and then start working on fixes for both.

Just to reiterate from the Pull Request that closed this Issue:

  1. Credentials are now written to /tekton/creds when the disable-home-env-overwrite flag is “true”.
  2. A new variable has been exposed, $(credentials.path) which points to the place where creds-init wrote the credentials.
  3. Our entrypoint binary will automatically copy credentials from $(credentials.path) to the Step’s HOME. We find the HOME directory using go-homedir rather than relying on just the $HOME env var.

@chanseokoh once v0.11.0-rc3 is released this fix will be available to try out. Very keen to hear your feedback / experience with the changes!

Design doc for this problem to be discussed in WG on wednesday: https://docs.google.com/document/d/1SVuDt-SXPHymz41dveSXFSPrx5Z-Wzb9eHliJAyYg4o

This sounds like my task will add a special contract only applicable for my catalog for providing Docker credentials. I’d like a general solution in a document way at the Tekton level, but I’ll look into mounting a Secret anyways.

Yeah that’s a fair point and I understand not wanting to take this path if this isn’t an approach that everyone uses. I’m wondering whether this should become a recommendation for catalog authors though - to expose (optional) workspaces for credentials to be mounted into. If everyone was doing it then it might not be bad?