cilium: Endpoint contains wrong Pod IPv4 address
Is there an existing issue for this?
- I have searched the existing issues
What happened?
We observed dropped traffic between Pods for which the corresponding NetworkPolicy actually should allow traffic. Upon investigation we figured out, that only one specific Pod (from a StatefulSet) as affected, and that the problem is caused by the fact that Cilium stored a wrong IPv4 address in the Pod’s CiliumEndpoint object.
The CiliumEndpoint’s status.networking.addressing.ipv4 field contained a wrong IPv4 address (10.8.142.14), not matching the effective IPv4 address of the Pod (10.8.2.32).
We have not yet figured out whether we can reliably reproduce this, right now it looks more like this is caused by a (rare) race condition (maybe related to StatefulSet specifics).
Endpoint
apiVersion: cilium.io/v2
kind: CiliumEndpoint
metadata:
creationTimestamp: '2022-05-23T11:39:49Z'
generation: 2
labels:
app.kubernetes.io/instance: grafana-agent-metrics
app.kubernetes.io/managed-by: grafana-agent-operator
app.kubernetes.io/name: grafana-agent
app.kubernetes.io/version: v0.23.0
controller-revision-hash: grafana-agent-metrics-shard-1-788dbb9c87
grafana-agent: grafana-agent-metrics
nx-k8s-topology-id: 8a8f0140-b805-44db-9dfe-625c6b5df899
operator.agent.grafana.com/name: grafana-agent-metrics
operator.agent.grafana.com/shard: '1'
operator.agent.grafana.com/type: metrics
statefulset.kubernetes.io/pod-name: grafana-agent-metrics-shard-1-0
name: grafana-agent-metrics-shard-1-0
namespace: grafana-agent-system
ownerReferences:
- apiVersion: v1
blockOwnerDeletion: true
kind: Pod
name: grafana-agent-metrics-shard-1-0
uid: 296f1523-ceac-4906-b4e1-0b8f4c4d5c01
resourceVersion: '456543044'
uid: 1c88af6b-67c0-4043-85f7-b821165f00ab
selfLink: >-
/apis/cilium.io/v2/namespaces/grafana-agent-system/ciliumendpoints/grafana-agent-metrics-shard-1-0
status:
encryption: {}
external-identifiers:
container-id: 1fd8732c92dbf8283a2ecd25908604c4db0420965ee7e4f0eaba1870002e017a
k8s-namespace: grafana-agent-system
k8s-pod-name: grafana-agent-metrics-shard-1-0
pod-name: grafana-agent-system/grafana-agent-metrics-shard-1-0
id: 873
identity:
id: 54072
labels:
- k8s:app.kubernetes.io/instance=grafana-agent-metrics
- k8s:app.kubernetes.io/managed-by=grafana-agent-operator
- k8s:app.kubernetes.io/name=grafana-agent
- k8s:app.kubernetes.io/version=v0.23.0
- >-
k8s:io.cilium.k8s.namespace.labels.grafana-agent-system.tree.hnc.x-k8s.io/depth=0
- >-
k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=grafana-agent-system
- >-
k8s:io.cilium.k8s.namespace.labels.kustomize.toolkit.fluxcd.io/name=grafana-agent
- >-
k8s:io.cilium.k8s.namespace.labels.kustomize.toolkit.fluxcd.io/namespace=kube-system
- k8s:io.cilium.k8s.namespace.labels.scheduling.nexxiot.com/fargate=false
- k8s:io.cilium.k8s.policy.cluster=default
- k8s:io.cilium.k8s.policy.serviceaccount=grafana-agent
- k8s:io.kubernetes.pod.namespace=grafana-agent-system
named-ports:
- name: http-metrics
port: 8080
protocol: TCP
networking:
addressing:
- ipv4: 10.8.142.14
node: 10.8.3.174
state: ready
Pod
apiVersion: v1
kind: Pod
metadata:
name: grafana-agent-metrics-shard-1-0
generateName: grafana-agent-metrics-shard-1-
namespace: grafana-agent-system
uid: 296f1523-ceac-4906-b4e1-0b8f4c4d5c01
resourceVersion: '456543083'
creationTimestamp: '2022-05-23T11:39:48Z'
labels:
app.kubernetes.io/instance: grafana-agent-metrics
app.kubernetes.io/managed-by: grafana-agent-operator
app.kubernetes.io/name: grafana-agent
app.kubernetes.io/version: v0.23.0
controller-revision-hash: grafana-agent-metrics-shard-1-788dbb9c87
grafana-agent: grafana-agent-metrics
nx-k8s-topology-id: 8a8f0140-b805-44db-9dfe-625c6b5df899
operator.agent.grafana.com/name: grafana-agent-metrics
operator.agent.grafana.com/shard: '1'
operator.agent.grafana.com/type: metrics
statefulset.kubernetes.io/pod-name: grafana-agent-metrics-shard-1-0
annotations:
kubectl.kubernetes.io/default-container: grafana-agent
kubernetes.io/psp: k8s.privileged-host
ownerReferences:
- apiVersion: apps/v1
kind: StatefulSet
name: grafana-agent-metrics-shard-1
uid: 116802d1-f8a2-43d1-863e-6de8e5ca8150
controller: true
blockOwnerDeletion: true
selfLink: /api/v1/namespaces/grafana-agent-system/pods/grafana-agent-metrics-shard-1-0
status:
phase: Running
conditions:
- type: Initialized
status: 'True'
lastProbeTime: null
lastTransitionTime: '2022-05-23T11:39:49Z'
- type: Ready
status: 'True'
lastProbeTime: null
lastTransitionTime: '2022-05-23T11:39:54Z'
- type: ContainersReady
status: 'True'
lastProbeTime: null
lastTransitionTime: '2022-05-23T11:39:54Z'
- type: PodScheduled
status: 'True'
lastProbeTime: null
lastTransitionTime: '2022-05-23T11:39:48Z'
hostIP: 10.8.4.230
podIP: 10.8.2.32
podIPs:
- ip: 10.8.2.32
startTime: '2022-05-23T11:39:49Z'
containerStatuses:
- name: config-reloader
state:
running:
startedAt: '2022-05-23T11:39:52Z'
lastState: {}
ready: true
restartCount: 0
image: quay.io/prometheus-operator/prometheus-config-reloader:v0.47.0
imageID: >-
quay.io/prometheus-operator/prometheus-config-reloader@sha256:0029252e7cf8cf38fc58795928d4e1c746b9e609432a2ee5417a9cab4633b864
containerID: >-
containerd://ade201bd0787e0e4ae9aeff887a091fad689eeec70406c65918e63908eb6a328
started: true
- name: grafana-agent
state:
running:
startedAt: '2022-05-23T11:39:52Z'
lastState: {}
ready: true
restartCount: 0
image: docker.io/grafana/agent:v0.23.0
imageID: >-
docker.io/grafana/agent@sha256:a0beeaa6642c69efa472d509be2e2cf97dcb1c7e74047cca59a9452bf068f763
containerID: >-
containerd://8535bdd6e890f8028dc90eb68dab2ab317ee8e615b90322b81d6ce8e125e1438
started: true
qosClass: Burstable
spec:
...
Cilium Version
1.11.3
Kernel Version
5.4.188-104.359.amzn2.x86_64
Kubernetes Version
1.21 (v1.21.5-eks-9017834)
Sysdump
Note: this is a ZIP compressed BZIP2 Tar archive (had to work the system, BZIP2 to reduce size and ZIP to make Github accept the file format).
cilium-sysdump-20220524-105632.tar.bz2.zip
Relevant log output
> hubble observe -f --verdict DROPPED
May 24 08:49:02.049: 10.8.2.32:45754 <> kube-system/kube-state-metrics-2:8081 Policy denied DROPPED (TCP Flags: SYN)
May 24 08:49:02.049: 10.8.2.32:45754 <> kube-system/kube-state-metrics-2:8081 Policy denied DROPPED (TCP Flags: SYN)
May 24 08:49:03.077: 10.8.2.32:45754 <> kube-system/kube-state-metrics-2:8081 Policy denied DROPPED (TCP Flags: SYN)
May 24 08:49:03.077: 10.8.2.32:45754 <> kube-system/kube-state-metrics-2:8081 Policy denied DROPPED (TCP Flags: SYN)
May 24 08:49:05.093: 10.8.2.32:45754 <> kube-system/kube-state-metrics-2:8081 Policy denied DROPPED (TCP Flags: SYN)
May 24 08:49:05.093: 10.8.2.32:45754 <> kube-system/kube-state-metrics-2:8081 Policy denied DROPPED (TCP Flags: SYN)
Anything else?
NetworkPolicy
Just for completeness, the corresponding NetworkPolicy object:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
kustomize.toolkit.fluxcd.io/name: kube-state-metrics
kustomize.toolkit.fluxcd.io/namespace: kube-system
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
ingress:
- ports:
- protocol: TCP
port: 8080
- protocol: TCP
port: 8081
from:
- podSelector:
matchLabels:
app.kubernetes.io/name: grafana-agent
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: grafana-agent-system
- ports:
- protocol: TCP
port: 8080
- protocol: TCP
port: 8081
from:
- podSelector: {}
egress:
- ports:
- protocol: TCP
port: 443
- protocol: TCP
port: 6443
policyTypes:
- Ingress
- Egress
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 19 (9 by maintainers)
Commits related to this issue
- k8s/watchers: add uid to patch request document. This is intended to prevent endpoints from overwriting ciliumendpoints that have the same name but are being managed by a new endpoint sync. This can... — committed to tommyp1ckles/cilium by tommyp1ckles 2 years ago
- pkg/watchers: prevent endpoints overwriting existing ciliumendpoints. Prevents endpointsynchronizer from taking ownership and managing ciliumendpoints, except in the case of endpoint restore where th... — committed to tommyp1ckles/cilium by tommyp1ckles 2 years ago
- k8s/watchers: add uid to patch request document. This is intended to prevent endpoints from overwriting ciliumendpoints that have the same name but are being managed by a new endpoint sync. This can... — committed to cilium/cilium by tommyp1ckles 2 years ago
- pkg/watchers: prevent endpoints overwriting existing ciliumendpoints. Prevents endpointsynchronizer from taking ownership and managing ciliumendpoints, except in the case of endpoint restore where th... — committed to cilium/cilium by tommyp1ckles 2 years ago
And another one, now even with timestamps:
Let me know if you need more details. Generally about the cluster: It’s made up of 128 Thread workers that run a very mixed workload, from quickly spawning CI jobs to hundreds of tiny apps and long running big applications, there is a constant load on it -> It’s conceivable Cilium sees many events/s happening.
I have been working on making a proof of concept to reproduce this: https://github.com/timbuchwaldt/cilium-19931-poc
I have seen it to generate unstable differences where CEPs and Pods went out of sync and at least two where they became stable and stayed like this.
You have to apply the
sts.yamlto create a 2-pod statefulset running nginx, alter thescheduler.shto select only scheduleable nodes (our workers have this brawn-label), run this, then startrun.shto start re-starting the pods. Upon success therun.shscript aborts, showing the diff between pod IPs and CEP IPs. Further validation is needed to see if this stays stable, I recommend running bothkubectl get pods -o wide -was well askubectl get cep -o wide -wto see the changes occuring.We are still working on making this reproduce the problem cleaner, as currently it’s unclear if this happes when being scheduled to the same nodes, different nodes, if labels have any effect and so on. I’ll keep this updated in case we get any closer to the actual problem.
Update:
Caught another stable reproduction:
In this case both pods were
selected=false, the k8s endpoint objects were correct, the CEPs are and stayed out of sync of the actual values.Our cluster is (mostly) on 1.23.7, running Cilium 1.12.1
@aanm No (as mentioned above) we cannot reproduce this (yet). Nevertheless, this should not happen and maybe reconciliation loop must be fixed to check regularly whether the
CiliumEndpointstill matches the Pod.