kubernetes: envFrom not working in ephemeral containers: failed to sync secret cache

What happened?

I have the following deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mytarget
spec:
  selector:
    matchLabels:
      mylabel: mytarget
  replicas: 2
  template:
    metadata:
      labels:
        mylabel: mytarget
    spec:
      serviceAccountName: default
      containers:
        - name: mytarget
          image: <custom-image>
          command: ["sleep", "100d"]

I am working on an automated script (written in Go) that, among other things, does the following:

  • Creates a Secret resource
s := apiv1.Secret{
	ObjectMeta: metav1.ObjectMeta{
		Name:      name,
		Namespace: metav1.NamespaceDefault,
	},
	StringData: map[string]string{
		"username": "foo",
		"password": "bar",
	},
}

_, err := clientset.CoreV1().Secrets(metav1.NamespaceDefault).Create(ctx, &s, metav1.CreateOptions{})
  • Lists pods and creates one ephemeral container for each one of them with the Secret added as env variable using envFrom:
pod.Spec.EphemeralContainers = append(pod.Spec.EphemeralContainers, apiv1.EphemeralContainer{
	TargetContainerName: targetContainerName,
	EphemeralContainerCommon: apiv1.EphemeralContainerCommon{
		Image:   imageURL,
		Name:    ephemeralContainerName,
		Command: getCommand(),
		EnvFrom: []apiv1.EnvFromSource{
			{
				SecretRef: &apiv1.SecretEnvSource{
					LocalObjectReference: apiv1.LocalObjectReference{
						Name: name,
					},
					Optional: nil,
				},
			},
		},
		SecurityContext: &apiv1.SecurityContext{
			Privileged: &[]bool{true}[0],
		},
	},
})

clientset.CoreV1().Pods(metav1.NamespaceDefault).UpdateEphemeralContainers(
	ctx, pod.Name, &pod, metav1.UpdateOptions{},
)

When I run it, the ephemeral container is created but never started, it is left in Waiting state with Reason CreateContainerConfigError.

Ephemeral Containers:
  ephemeral1234567:
    Container ID:
    Image:         <image>
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command: <command>
    State:          Waiting
      Reason:       CreateContainerConfigError
    Ready:          False
    Restart Count:  0
    Environment Variables from:
      secret-name  Secret  Optional: false
    Environment:     <none>
    Mounts:          <none>

The code seems to be correct since I can see the Secret is referenced in the Environment Variables.

Checking the events I can see the following:

Type Reason Age From Message
Warning Failed 3s kubelet Error: failed to sync secret cache: timed out waiting for the condition

It seems the cache is not synced when an ephemeral container is created in a pod and therefore its creation fails because the Secret is not present in the cache.

Based on that previous assumption, I tried referencing the secret in the regular container defined in the deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mytarget
spec:
  selector:
    matchLabels:
      mylabel: mytarget
  replicas: 2
  template:
    metadata:
      labels:
        mylabel: mytarget
    spec:
      serviceAccountName: default
      containers:
        - name: mytarget
          image: <custom-image>
          command: ["sleep", "100d"]
          envFrom:
          - secretRef:
              name: secret-name

Running the same code that creates the ephemeral container is now run successfully and it creates the container with the secret as an environment variable.

exp123456:
    Container ID:  containerd://2398e96b9add961fd292d127d9e383b98c0c88100f87e0b8c7a8ff46aa1becdc
    Image:         <image>
    Image ID:      <imageId>
    Port:          <none>
    Host Port:     <none>
    Command: <command>
    State:          Running
      Started:      Wed, 11 Jan 2023 13:28:57 +0000
    Ready:          False
    Restart Count:  0
    Environment Variables from:
      manual-secret  Secret  Optional: false
    Environment:     <none>
    Mounts:          <none>

What did you expect to happen?

I would expect the ephemeral container to be created and successfully started with the appropriate environment variable containing the data in the secret.

How can we reproduce it (as minimally and precisely as possible)?

Using the code provided in the explanation of the issue should be enough to reproduce the problem.

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mytarget
spec:
  selector:
    matchLabels:
      mylabel: mytarget
  replicas: 2
  template:
    metadata:
      labels:
        mylabel: mytarget
    spec:
      serviceAccountName: default
      containers:
        - name: mytarget
          image: <custom-image>
          command: ["sleep", "100d"]

Secret:

apiVersion: v1
kind: Secret
metadata:
  name: manual-secret
type: Opaque
stringData:
  token: test1234

Request to update the Pod to create an ephemeral container:

PATCH http://localhost:8080/api/v1/namespaces/{namespace}/pods/{podname}/ephemeralcontainers

Add the following ephemeral container

ephemeralContainers:
  - name: ephemeral-uid-123456
    image: nginx
    command:
    - "sleep 100d"
    envFrom:
    - secretRef:
        name: manual-secret
    resources: {}
    terminationMessagePath: "/dev/termination-log"
    terminationMessagePolicy: File
    imagePullPolicy: IfNotPresent
    securityContext:
      privileged: true
    targetContainerName: mytarget

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:58:30Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.7-eks-fb459a0", GitCommit:"c240013134c03a740781ffa1436ba2688b50b494", GitTreeState:"clean", BuildDate:"2022-10-24T20:36:26Z", GoVersion:"go1.18.7", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.26) and server (1.24) exceeds the supported minor version skew of +/-1

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.17.1
PRETTY_NAME="Alpine Linux v3.17"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"
$ uname -a
Linux pod-6bb864f684-tl5l7 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 Linux

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 3
  • Comments: 20 (14 by maintainers)

Most upvoted comments

Hello @enj ! 👋🏼 Bug triage for the 1.29 release cycle is here! This issue hasn’t been updated for a long time, so I wanted to check what the status is. The code freeze will start (01:00 UTC Wednesday 1st November 2023 / 18:00 PDT Tuesday 31st October 2023), which is this week. We want to make sure that every PR has a chance to be merged on time.

As the issue is tagged for 1.29, is it still planned for that release?

Just a wild guess, can this be related to this bug: https://github.com/kubernetes/kubernetes/issues/114167?