cilium: Cannot create CiliumIdentity for ServiceAccount names longer than 63 chars
Bug report
I found this bug while trying out EMR on EKS on a Cilium-enabled EKS1.20 cluster.
When a service account’s name is longer than 63 characters, it is not possible to start a pod using this service account.
Indeed, when creating the CiliumIdentity, Cilium will set a label io.cilium.k8s.policy.serviceaccount whose value will be the name of the service account for that pod. However, label values are limited to a maximum of 63 characters. Service Account names can be up to 253 characters long.
General Information
- Cilium version: v1.9.7
- Kernel version: 5.5.4.117-58.216.amzn2.x86_64
- Orchestration system version in use: EKS 1.20
How to reproduce the issue
- Create a service account with a name longer than 63 characters
$ kubectl create sa emr-containers-sa-spark-executor-123456789012-h94a5lkq1wmdnn0lu3ldn86aul757y413dgn7tj9zmkq4tujzz4mzp
- Create a pod using this serviceAccount
$ cat <<EOT >> pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: static-web
labels:
role: myrole
namespace: emr
spec:
containers:
- name: web
image: nginx
ports:
- name: web
containerPort: 80
protocol: TCP
serviceAccountName: emr-containers-sa-spark-executor-123456789012-h94a5lkq1wmdnn0lu3ldn86aul757y413dgn7tj9zmkq4tujzz4mzp
EOT
$ kubectl apply -f pod.yaml
Then after a while the pod will fail to start:
$ kubectl describe pod static-web
Name: static-web
Namespace: emr
Priority: 0
Node: ip-10-210-158-23.eu-west-1.compute.internal/10.210.158.23
Start Time: Thu, 17 Jun 2021 14:03:05 +0200
Labels: role=myrole
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP:
IPs: <none>
Containers:
web:
Container ID:
Image: nginx
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from emr-containers-sa-spark-executor-123456789012-h94a5lkq1wmdqrkqt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
emr-containers-sa-spark-executor-123456789012-h94a5lkq1wmdqrkqt:
Type: Secret (a volume populated by a Secret)
SecretName: emr-containers-sa-spark-executor-123456789012-h94a5lkq1wmdqrkqt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m29s default-scheduler Successfully assigned emr/static-web to ip-10-210-158-23.eu-west-1.compute.internal
Warning FailedCreatePodSandBox 3m58s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e2b35de5c07d64130ebc14ee80e9c936350ad0f8973357ae1f8641bb3edf2270" network for pod "static-web": networkPlugin cni failed to set up pod "static-web_emr" network: Unable to create endpoint: Cilium API client timeout exceeded
Warning FailedCreatePodSandBox 2m27s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8f7a714de44b928728a21066979bd0a5421d29f9215471b6980c67e817da5924" network for pod "static-web": networkPlugin cni failed to set up pod "static-web_emr" network: Unable to create endpoint: Cilium API client timeout exceeded
Warning FailedCreatePodSandBox 56s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c60b6d3993dd53224453ec36877cec1ae3bfcd6334049cb4f76d3620419495da" network for pod "static-web": networkPlugin cni failed to set up pod "static-web_emr" network: Unable to create endpoint: Cilium API client timeout exceeded
Normal SandboxChanged 55s (x3 over 3m57s) kubelet Pod sandbox changed, it will be killed and re-created.
And then we will see this in Cilium logs:
level=warning msg="Key allocation attempt failed" attempt=10 error="unable to allocate ID 2664 for key [k8s:io.cilium.k8s.policy.cluster=default k8s:io.cilium.k8s.policy.serviceaccount=emr-containers-sa-spark-executor-123456789012-h94a5lkq1wmdnn0lu3ldn86aul757y413dgn7tj9zmkq4tujzz4mzp k8s:io.kubernetes.pod.namespace=emr k8s:role=myrole]: CiliumIdentity.cilium.io \"2664\" is invalid: metadata.labels: Invalid value: \"emr-containers-sa-spark-executor-123456789012-h94a5lkq1wmdnn0lu3ldn86aul757y413dgn7tj9zmkq4tujzz4mzp\": must be no more than 63 characters" key="[k8s:io.cilium.k8s.policy.cluster=default k8s:io.cilium.k8s.policy.serviceaccount=emr-containers-sa-spark-executor-123456789012-h94a5lkq1wmdnn0lu3ldn86aul757y413dgn7tj9zmkq4tujzz4mzp k8s:io.kubernetes.pod.namespace=emr k8s:role=myrole]" subsys=allocator
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 4
- Comments: 18 (7 by maintainers)
We’ve also run into this issue (outside of EMR/EKS).
While one could say that this is an unfortunate Kubernetes upstream restriction, I believe it is on Cilium to address: it seems to put service account names (which can be longer than 63 characters) into labels (which must not be longer than 63 characters). Asking Kubernetes to change the limit would probably be quite an undertaking, and asking all clients to adjust does not scale well when you have thousands of Cilium users. Does Cilium actually select on the labels? I assume it does, though if not then putting service account names into annotations could be an alternative.
(Side note: this issue is fairly hard to detect just by looking at the events which only surface a Cilium API client timeout. A welcome drive-by improvement would be to surface the error more directly.)
When Cilium creates Identities with labels, the label content can be matched in policy. By associating the k8s service account as a label into the Identity, this means that (a) Identities for applications with otherwise similar labels will be differentiated in policy by the ServiceAccount and (b) Users can also write policies to match on those labels in order to allow traffic based on the ServiceAccount, in addition or as an alternative to other application labels.
If this is a sticking point for some users and the users do not wish to write policies based on the ServiceAccounts, then I think that the Cilium community would be open to PRs to propose a way to disable ServiceAccount population in Identities, for example using a flag.
Another option to explore might be to see whether the existing flags to limit Identity-relevant labels would provide a way to remove these labels. I’m not sure if it works that way, but it could be worth investigating.
@joestringer Couple of days ago I spoke with AWS engineers about that issue and seems they were not aware of it. I raised the support case for AWS. Maybe they will consider to make SA shorter for EMR.