kubernetes: Setting defaultMode is not Fully Respected When Pod.spec.securityContext.runAsUser is Set

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

Setting Pod.spec.securityContext.runAsUser causes the group read permission bit to be set on secrets exposed via volumes, even if Pod.spec.volumes[x].defaultMode is set to 256.

See also: https://github.com/openshift/origin/issues/16424

What you expected to happen:

Given a defaultMode of 256 the file mode should be 0400 but it is 0440 instead.

How to reproduce it (as minimally and precisely as possible):

Create the following objects and observe the logs of the created pod:

---
apiVersion: v1
data:
  test: dGVzdA==
kind: Secret
metadata:
  name: test-secret
type: Opaque

---
apiVersion: v1
kind: Pod
metadata:
  generateName: issue-repro-
spec:
  securityContext:
    runAsUser: 1000
    fsGroup: 1000
  containers:
  - image: busybox
    name: busybox
    imagePullPolicy: IfNotPresent
    args:
    - "ls"
    - "-alR"
    - "/tmp/dummy-secret"
    volumeMounts:
    - mountPath: /tmp/dummy-secret
      name: test-secret

  volumes:
  - name: test-secret
    secret:
      defaultMode: 256
      secretName: test-secret

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-16T03:15:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"0b9efaeb34a2fc51ff8e4d34ad9bc6375459c4a4", GitTreeState:"clean", BuildDate:"2017-11-29T22:43:34Z", GoVersion:"go1.9.1", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
$ minikube version
minikube version: v0.24.1
  • OS (e.g. from /etc/os-release):
$ cat /etc/os-release
NAME=Buildroot
VERSION=2017.02
ID=buildroot
VERSION_ID=2017.02
PRETTY_NAME="Buildroot 2017.02"
  • Kernel (e.g. uname -a):
$ uname -a
Linux minikube 4.9.13 #1 SMP Thu Oct 19 17:14:00 UTC 2017 x86_64 GNU/Linux
  • Install tools:
  • Others:

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 31
  • Comments: 46 (23 by maintainers)

Commits related to this issue

Most upvoted comments

There should be a way how to provide e.g. ssh key as 0600.

IMO, if user specifies Mode or DefaultMode, we should honor it and not to overwrite it during fsGroup. I know that Mode description mentions "This might be in conflict with other options that affect the file mode, like fsGroup, and the result can be other mode bits set.", I know it may break existing apps, but why user wrote 0600 and expected 0640 instead? To me it looks like a bug in Pod object.

We can change AtomicWriter to do chown and chmod using fsGroup if Mode/DefaultMode is not set.

For anyone who is following this thread: we are planning to resolve this issue in 1.22 release. Here is the enhancement: https://github.com/kubernetes/enhancements/pull/2606. Feel free to take a look if you are interested. Thanks!

The workaround I used for this was to run an initContainer which mounts and SSH Private Key in a temp directory, then copies it to an emptyDir and chmod’s it appropriately. The container that then needs the private key mounts the same emptyDir volume

apiVersion: v1
kind: Deployment
metadata:
  name: my-example-deployment
spec:
  template:
    spec:
      initContainers:
        - name: prep-id-rsa
          image: busybox:1.30
          command:
            - sh
            - -c
            - |-
              cp /tmp/id_rsa /well/known/dir/id_rsa
              chmod 0600 /well/known/dir/id_rsa
          volumeMounts:
            - name: id-rsa
              mountPath: /tmp/id_rsa
              subPath: id_rsa
            - name: empty-dir
              mountPath: /well/known/dir
      containers:
        - name: my-container
          image: alpine:3
          command:
            - sh
            - -c
            - |-
              ls -la /tmp/id_rsa
              ls -la /root/.ssh
          volumeMounts:
            - name: id-rsa
              mountPath: /tmp/id_rsa
              subPath: id_rsa
            - name: empty-dir
              mountPath: /root/.ssh
          resources:
            requests:
              cpu: 1m
              memory: 1Mi
            limits:
              cpu: 1m
              memory: 1Mi
          securityContext:
            runAsUser: 1000
      volumes:
        - name: id-rsa
          secret:
            secretName: my-ssh-private-key
        - name: empty-dir
          emptyDir:
            medium: memory

Any updates on this? This continues to break users that are attempting to us features like ProjectedVolumeMounts with sidecar containers: https://github.com/istio/istio/issues/26882

I’m also running into an issue with ownership / permissions on a Kubernetes secret. I have a container (that is run by k8s cron, but that’s beside the point), that runs as uid=999 (pod security: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups). I want to use a secret, being a SSL-certificate private key for client cert auth. Using Client certificates requires permissions to be 0600., which means that ownership must be set to 999: also (since uid=999 can only read a file with permission 0600 if he’s owner). I can’t seem to be getting it to work. For now I worked around the issue by reading the file, and writing the contents to a new file on tmpfs with proper ownership / permissions. But copying contents of secrets to other locations feels a bit contra dictional to what we want to achieve (higher security).

Is this the same issue / related? Or should I submit a new?

The problem is not that defaultMode is not respected when runAsUser is set. It is not respected when fsGroup is set, which is documented on one hand. On the other hand, it would be nice for the most specific setting to take precedence.

The issue here is that fsGroup set means all volumes should have this GID and read permissions for the group. When defaultMode is set, this mode is used. When both are set, it is not obvious what should take precedence.

This was documented in the KEP and in the defaulMode field documentation. But I do think the bahviour might be better if the “closest” setting to the volume takes precedence.

/triage accepted

I think the next step in investigating a solution for this issue is to see if we can generalize the work done for projected service account tokens to other volume types.

IMO, fsGroup and defaultMode conflicts in Secrets (and other AtomicWriter volumes). AtomicWriter.Write prepares the volume with the right defaultMode first, i.e. a file is chmod to 400 (oct) and only after that SetVolumeOwnership is called to apply fsGroup:

https://github.com/kubernetes/kubernetes/blob/657a1a1a3457bc599005b1ca30c338c03e9d4aa0/pkg/volume/secret/secret.go#L251-L261

SetVolumeOwnership then chmods the file to 660 (or 440 for read-only files), ignoring any defaultMode:

https://github.com/kubernetes/kubernetes/blob/7a9f21bbb828a0f58e6c51234c1ba0e16efb6727/pkg/volume/volume_linux.go#L76-L86

hmm, reading further, it looks like this is working as designed for fsGroup, though that design doesn’t play nicely with the mode options (the API doc for those mode fields explicitly says you might end up with different options when combined with fsGroup).

different containers can run as different uids, yet still share a single secret volume. the only way for them to all be able to read the data from that volume when running as different uids is to have a common fsGroup set, and have group read permissions applied to the data in that volume. that means expanding the file permissions to include group permissions, even when that was not requested or desired.

the only real way I see to fix this is to change the 1-volume:*-containers relationship and start making per-container (or per-uid) volumes for these types, setting file owner to the resolved uid of the container (which is not always possible to resolve), but that’s a major redesign/departure of the volume subsystem