ingress-nginx: Controller uses wrong labels during shutdown to determine multiple pods running
During an ingress-nginx helm upgrade with multiple replicas the ingress it sometimes removes the loadbalancerIP status from all ingress resources. After a few seconds the loadbalancerIPs are added on the ingresses again. This happens because the leader controller thinks it is the last pod left and it should clean up the ingress status. It only happens if the last pod of the old version is the leader. The controller checks if there are any pods left with the same labels, but since helm adds the ‘helm.sh/chart=ingress-nginx-4.8.0’ and ‘app.kubernetes.io/version: 1.9.0’ labels and those get changed when upgrading the helm chart. Because it finds no other controller pods it thinks it is the last one and needs to clean up the ingress statusses.
What happened:
- We have 2 pods of the chart version 4.8.0 and all ingress have an status with loadbalancerIP.
- We run the helm upgrade command to upgrade the pods chart 4.8.1.
- This creates a new replicaset and will do a rolling upgrade of the pods.
- A new pod (chart version 4.8.1) is created and becomes healthy.
- One of the old pods (chart version 4.8.0) gets removed by Kubernetes, the leader is still on the old pod.
- The second new pod (chart version 4.8.1) is created and becomes healthy.
- The last old pod (chart version 4.8.0) gets removed by Kubernetes. Because there are no pods with the same labels anymore it removes the ingress statusses. (this can be seen by running
kubectl get ingressand there will be no address for the ingress) - One of the new pods (chart version 4.8.1) is elected as leader and updates the ingress statusses with the loadbalancerIP.
Between the last 2 steps there is no loadbalancerIP on the ingresses. Thus tools that use this information like external-dns will remove the dns entry and the ingressdomain will not be resolvable for a while.
This results in the following (no address on the ingress):
kubectl get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
test-ingress nginx example.com 80, 443 90d
What you expected to happen:
We expect the leader controller to see the pods from the new version, and know it is not the last pod. This means it should not clean up the ingress statusses.
- We have 2 pods of the chart version 4.8.0 and all ingress have an status with loadbalancerIP.
- We run the helm upgrade command to upgrade the pods chart 4.8.1.
- This creates a new replicaset and will do a rolling upgrade of the pods.
- A new pod (chart version 4.8.1) is created and becomes healthy.
- One of the old pods (chart version 4.8.0) gets removed by Kubernetes, the leader is still on the old pod.
- The second new pod (chart version 4.8.1) is created and becomes healthy.
- The last old pod (chart version 4.8.0) gets removed by Kubernetes. It sees the new pods and knows it is not the last one.
- One of the new pods (chart version 4.8.1) is elected as leader and updates the ingress statusses with the loadbalancerIP.
We would expect during the totality of the upgrade:
NAME CLASS HOSTS ADDRESS PORTS AGE
test-ingress nginx * zz.zz.zz.zz 80, 443 90d
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): v1.9.1
Kubernetes version (use kubectl version): v1.27.3
Environment:
- Cloud provider or hardware configuration: AKS
- OS (e.g. from /etc/os-release): AKSCBLMariner-V2gen2-202309.06.0
- Kernel (e.g.
uname -a): 5.15.126.1-1.cm2 - Install tools: AKS
- Basic cluster related info:
kubectl versionkubectl get nodes -o wide
kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"8b6cfe2c7c54ae110e0c2dbcc52b468bc08bf5f6", GitTreeState:"clean", BuildDate:"2023-07-28T22:18:46Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-d2adsweu1-38256127-vmss000000 Ready agent 7d22h v1.27.3 10.110.17.100 <none> CBL-Mariner/Linux 5.15.126.1-1.cm2 containerd://1.6.22
aks-d2adsweu1-38256127-vmss000001 Ready agent 7d22h v1.27.3 10.110.18.39 <none> CBL-Mariner/Linux 5.15.126.1-1.cm2 containerd://1.6.22
aks-d2adsweu1-38256127-vmss000002 Ready agent 5d20h v1.27.3 10.110.16.10 <none> CBL-Mariner/Linux 5.15.126.1-1.cm2 containerd://1.6.22
aks-systemweu1-22699295-vmss000000 Ready agent 8d v1.27.3 10.110.16.114 <none> CBL-Mariner/Linux 5.15.126.1-1.cm2 containerd://1.6.22
aks-systemweu1-22699295-vmss000001 Ready agent 8d v1.27.3 10.110.16.223 <none> CBL-Mariner/Linux 5.15.126.1-1.cm2 containerd://1.6.22
How was the ingress-nginx-controller installed:
We use ArgoCD with the ingress-nginx helm chart version 4.8.1 (or 4.8.0 before upgrade).
The following values are used:
controller:
resources:
requests:
memory: 150Mi
limits:
memory: 300Mi
replicaCount: 2
allowSnippetAnnotations: true
service:
externalTrafficPolicy: 'Local'
external:
enabled: true
Current State of the controller:
kubectl describe ingressclasses
Name: nginx
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=zzz-zzz-ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.9.1
helm.sh/chart=ingress-nginx-4.8.1
Annotations: argocd.argoproj.io/tracking-id: zzz-zzz-ingress:networking.k8s.io/IngressClass:ingresscontroller/nginx
ingressclass.kubernetes.io/is-default-class: true
Controller: k8s.io/ingress-nginx
Events: <none>
kubectl -n ingresscontroller get all
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/zzz-zzz-ingress-ingress-nginx-controller-7849497db8gmxmz 1/1 Running 0 5d20h 10.110.16.227 aks-systemweu1-22699295-vmss000001 <none> <none>
pod/zzz-zzz-ingress-ingress-nginx-controller-7849497db8pjwsf 1/1 Running 0 5d20h 10.110.16.188 aks-systemweu1-22699295-vmss000000 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/zzz-zzz-ingress-ingress-nginx-controller LoadBalancer 10.0.18.29 zz.zz.zz.zz 80:31057/TCP,443:31848/TCP 386d app.kubernetes.io/component=controller,app.kubernetes.io/instance=zzz-zzz-ingress,app.kubernetes.io/name=ingress-nginx
service/zzz-zzz-ingress-ingress-nginx-controller-admission ClusterIP 10.0.49.26 <none> 443/TCP 388d app.kubernetes.io/component=controller,app.kubernetes.io/instance=zzz-zzz-ingress,app.kubernetes.io/name=ingress-nginx
service/zzz-zzz-ingress-ingress-nginx-controller-metrics ClusterIP 10.0.155.249 <none> 10254/TCP 388d app.kubernetes.io/component=controller,app.kubernetes.io/instance=zzz-zzz-ingress,app.kubernetes.io/name=ingress-nginx
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES
SELECTOR
deployment.apps/zzz-zzz-ingress-ingress-nginx-controller 2/2 2 2 388d controller registry/registry.k8s.io/ingress-nginx/controller:v1.9.1 app.kubernetes.io/component=controller,app.kubernetes.io/instance=zzz-zzz-ingress,app.kubernetes.io/name=ingress-nginx
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES
SELECTOR
replicaset.apps/zzz-zzz-ingress-ingress-nginx-controller-7849497db8 2 2 2 5d20h controller registry/registry.k8s.io/ingress-nginx/controller:v1.9.1 app.kubernetes.io/component=controller,app.kubernetes.io/instance=zzz-zzz-ingress,app.kubernetes.io/name=ingress-nginx,pod-template-hash=7849497db8
kubectl -n ingresscontroller describe po zzz-zzz-ingress-ingress-nginx-controller-7849497db8
Name: zzz-zzz-ingress-ingress-nginx-controller-7849497db8gmxmz
Namespace: ingresscontroller
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: zzz-zzz-ingress-ingress-nginx
Node: aks-systemweu1-22699295-vmss000001/10.110.16.223
Start Time: Wed, 04 Oct 2023 16:48:00 +0200
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=zzz-zzz-ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.9.1
helm.sh/chart=ingress-nginx-4.8.1
pod-template-hash=7849497db8
Annotations: kubectl.kubernetes.io/restartedAt: 2023-07-11T13:59:40Z
zzz/logging-module: nginx
Status: Running
SeccompProfile: RuntimeDefault
IP: 10.110.16.227
IPs:
IP: 10.110.16.227
Controlled By: ReplicaSet/zzz-zzz-ingress-ingress-nginx-controller-7849497db8
Containers:
controller:
Container ID: containerd://be51dd263b59a482b9ce53843d956b1d9ebb2d4a88678fa845cd426f19132c3c
Image: registry/registry.k8s.io/ingress-nginx/controller:v1.9.1
Image ID: registry/registry.k8s.io/ingress-nginx/controller@sha256:65c804ad254ac378d316919687b782850dd36c1f677f1115a1db29da59376f18
Ports: 80/TCP, 443/TCP, 10254/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/zzz-zzz-ingress-ingress-nginx-controller
--election-id=zzz-zzz-ingress-ingress-nginx-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/zzz-zzz-ingress-ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
--ingress-class-by-name=true
--default-ssl-certificate=certmanager/default-public-wildcard-tls-secret
--enable-ssl-passthrough=false
State: Running
Started: Wed, 04 Oct 2023 16:48:01 +0200
Ready: True
Restart Count: 0
Limits:
memory: 350Mi
Requests:
cpu: 100m
memory: 175Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: zzz-zzz-ingress-ingress-nginx-controller-7849497db8gmxmz (v1:metadata.name)
POD_NAMESPACE: ingresscontroller (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-stjv4 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: zzz-zzz-ingress-ingress-nginx-admission
Optional: false
kube-api-access-stjv4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: zzz/workload=zzz
kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/component=controller,app.kubernetes.io/instance=zzz-zzz-ingress,app.kubernetes.io/name=ingress-nginx
topology.kubernetes.io/zone:DoNotSchedule when max skew 1 is exceeded for selector app.kubernetes.io/component=controller,app.kubernetes.io/instance=zzz-zzz-ingress,app.kubernetes.io/name=ingress-nginx
Events: <none>
Name: zzz-zzz-ingress-ingress-nginx-controller-7849497db8pjwsf
Namespace: ingresscontroller
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: zzz-zzz-ingress-ingress-nginx
Node: aks-systemweu1-22699295-vmss000000/10.110.16.114
Start Time: Wed, 04 Oct 2023 16:48:11 +0200
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=zzz-zzz-ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.9.1
helm.sh/chart=ingress-nginx-4.8.1
pod-template-hash=7849497db8
Annotations: kubectl.kubernetes.io/restartedAt: 2023-07-11T13:59:40Z
zzz/logging-module: nginx
Status: Running
SeccompProfile: RuntimeDefault
IP: 10.110.16.188
IPs:
IP: 10.110.16.188
Controlled By: ReplicaSet/zzz-zzz-ingress-ingress-nginx-controller-7849497db8
Containers:
controller:
Container ID: containerd://f931f28240cabc99c4cd417125d6330af42d0d49d96188ccf5048181648bf404
Image: registry/registry.k8s.io/ingress-nginx/controller:v1.9.1
Image ID: registry/registry.k8s.io/ingress-nginx/controller@sha256:65c804ad254ac378d316919687b782850dd36c1f677f1115a1db29da59376f18
Ports: 80/TCP, 443/TCP, 10254/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/zzz-zzz-ingress-ingress-nginx-controller
--election-id=zzz-zzz-ingress-ingress-nginx-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/zzz-zzz-ingress-ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
--ingress-class-by-name=true
--default-ssl-certificate=certmanager/default-public-wildcard-tls-secret
--enable-ssl-passthrough=false
State: Running
Started: Wed, 04 Oct 2023 16:48:12 +0200
Ready: True
Restart Count: 0
Limits:
memory: 350Mi
Requests:
cpu: 100m
memory: 175Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAME: zzz-zzz-ingress-ingress-nginx-controller-7849497db8pjwsf (v1:metadata.name)
POD_NAMESPACE: ingresscontroller (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qlpls (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
webhook-cert:
Type: Secret (a volume populated by a Secret)
SecretName: zzz-zzz-ingress-ingress-nginx-admission
Optional: false
kube-api-access-qlpls:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: zzz/workload=zzz
kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/component=controller,app.kubernetes.io/instance=zzz-zzz-ingress,app.kubernetes.io/name=ingress-nginx
topology.kubernetes.io/zone:DoNotSchedule when max skew 1 is exceeded for selector app.kubernetes.io/component=controller,app.kubernetes.io/instance=zzz-zzz-ingress,app.kubernetes.io/name=ingress-nginx
Events: <none>
kubectl -n ingresscontroller describe svc zzz-zzz-ingress-ingress-nginx-controller
Name: zzz-zzz-ingress-ingress-nginx-controller
Namespace: ingresscontroller
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=zzz-zzz-ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
app.kubernetes.io/version=1.9.1
helm.sh/chart=ingress-nginx-4.8.1
Annotations: argocd.argoproj.io/tracking-id: zzz-zzz-ingress:/Service:ingresscontroller/zzz-zzz-ingress-ingress-nginx-controller
service.beta.kubernetes.io/azure-load-balancer-ipv4: zz.zz.zz.zz
service.beta.kubernetes.io/azure-load-balancer-resource-group: zzz
Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=zzz-zzz-ingress,app.kubernetes.io/name=ingress-nginx
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.0.18.29
IPs: 10.0.18.29
LoadBalancer Ingress: zz.zz.zz.zz
Port: http 80/TCP
TargetPort: http/TCP
NodePort: http 31057/TCP
Endpoints: 10.110.16.188:80,10.110.16.227:80
Port: https 443/TCP
TargetPort: https/TCP
NodePort: https 31848/TCP
Endpoints: 10.110.16.188:443,10.110.16.227:443
Session Affinity: None
External Traffic Policy: Local
HealthCheck NodePort: 31796
Events: <none>
How to reproduce this issue:
Install the ingress-nginx helm chart:
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx --set controller.replicaCount=2 --version 4.8.0
Create an ingress and wait until it has an loadbalancerIp:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: test-ingress
spec:
defaultBackend:
service:
name: test
port:
number: 80
Upgrade the helm chart:
helm upgrade ingress-nginx ingress-nginx/ingress-nginx --set controller.replicaCount=2 --version 4.8.1
Check if the ingress resource still has an loadbalancerIp during the upgrade:
k get ingress
We would expect:
NAME CLASS HOSTS ADDRESS PORTS AGE
test-ingress nginx * zz.zz.zz.zz 80, 443 90d
We get:
NAME CLASS HOSTS ADDRESS PORTS AGE
test-ingress nginx * 80, 443 90d
Since it does not trigger every upgrade these could be repeated in reverse order, it can also be achieved by downgrading the version.
Anything else we need to know:
The logs of the old leader pod show the following error:
I1004 12:29:58.913037 7 status.go:135] "removing value from ingress status" address=[{"ip":"xx.xx.xx.xx"}]
During shutdown this check should be true, but it is not: https://github.com/kubernetes/ingress-nginx/blob/8ce61bdc6761f04d0ce617b9125255e9a147a20c/internal/ingress/status/status.go#L130
In the function it checks the labels on the pod and gets all pods with these exact labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: xxx-ingress
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.9.0
helm.sh/chart: ingress-nginx-4.8.0
But when the version in changed the new pods will have other labels and thus the check fails. It should only check on the labels that do not change during version updates. The selector labels should probably work in this case, since these are also used for the controller service to route traffic to the controller pods. The labels on the pods in the helm chart where changed during this change: https://github.com/kubernetes/ingress-nginx/pull/9732. But since it does not trigger every upgrade and only affects services that use the loadbalancerIP from the ingress resources not everyone will notice it on every upgrade.
About this issue
- Original URL
- State: open
- Created 9 months ago
- Comments: 20 (7 by maintainers)
Ehm, random shower thought: Why not using the endpoints of the service configured to publish to the Ingress resources?
Like: The load balancer address published to the Ingress resources reconciled by an Ingress NGINX Controller is configured via the
--publish-serviceflag. So in the same way we could just check the endpoints of this service. If the leader is the last one remaining, it’s fine to remove this load balancer address from the reconciled Ingress resources.No labels, no exceptions involved, just plain Kubernetes concepts. Or am I missing something?
@longwuyuan @strongjz would it be acceptable in the immediate term to modify the
isRunningMultiplePodsto use the labelsAnd in the long term to implement the change suggested by @Gacko?
The current implementation interrupts externaldns integration regularly and I’m surprised more people haven’t noticed this.
If it’s acceptable we can implement this change.
Note that this causes downtime when used with externaldns because externaldns removes dns upon the clearing of the status by the old leader.