application-gateway-kubernetes-ingress: AGIC Ingress controller: Liveness probe failed
Describe the bug
After creating new chart of application-gateway-kubernetes-ingress/ingress-azure --version 1.0.0
pods created get stuck with Readiness probe failed and “new-pod-ingress-azure” gets in in 0/1 pods running.
After running a “describe pod” of that pod we are getting:
Warning Unhealthy 13s (x10 over 113s) kubelet, aks-agentpool-76661633-1 Readiness probe failed: Get http://10.2.0.15:8123/health/ready: dial tcp 10.2.0.15:8123: connect: connection refused
Warning Unhealthy 7s (x6 over 107s) kubelet, aks-agentpool-76661633-1 Liveness probe failed: Get http://10.2.0.15:8123/health/alive: dial tcp 10.2.0.15:8123: connect: connection refused
To Reproduce
We already created 5 o 6 cluster using this procedure and it´s first time we have this issue:
steps:
1- kubectl create -f https://raw.githubusercontent.com/Azure/aad-pod-identity/master/deploy/infra/deployment-rbac.yaml
2-Create azure identity in RG where nodes are running (MC_xxxxxx) and adding WAF as reader in RG and as contributor in azure identity component
3- kubectl create -f aadpodidentity.yml
4- helm install -f dev-chart.yml application-gateway-kubernetes-ingress/ingress-azure --version 1.0.0
5- create binding : yml values: AzureIdentity: name of identity created in step 3. selector: [nameofstep4]-ingress-azure
Ingress Controller details
Name: bunking-lizard-ingress-azure-7b54978988-rs8h2
Namespace: default
Priority: 0
Node: aks-agentpool-76661633-1/10.2.0.4
Start Time: Mon, 18 Nov 2019 20:42:53 -0300
Labels: aadpodidbinding=bunking-lizard-ingress-azure
app=ingress-azure
pod-template-hash=7b54978988
release=bunking-lizard
Annotations: prometheus.io/port: 8123
prometheus.io/scrape: true
Status: Running
IP: 10.2.0.15
IPs: <none>
Controlled By: ReplicaSet/bunking-lizard-ingress-azure-7b54978988
Containers:
ingress-azure:
Container ID: docker://bd46f4833b77b806132fd7cd7729aca5ffbf117bbee3a2e7c011ffb109894db0
Image: mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.0.0
Image ID: docker-pullable://mcr.microsoft.com/azure-application-gateway/kubernetes-ingress@sha256:c295f99ae66443c5a392fd894620fcd1fc313b9efdec96d13f166fefb29780a9
Port: <none>
Host Port: <none>
State: Running
Started: Mon, 18 Nov 2019 20:44:58 -0300
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Mon, 18 Nov 2019 20:43:59 -0300
Finished: Mon, 18 Nov 2019 20:44:58 -0300
Ready: False
Restart Count: 2
Liveness: http-get http://:8123/health/alive delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: http-get http://:8123/health/ready delay=5s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
bunking-lizard-cm-ingress-azure ConfigMap Optional: false
Environment:
AZURE_CONTEXT_LOCATION: /etc/appgw/azure.json
AGIC_POD_NAME: bunking-lizard-ingress-azure-7b54978988-rs8h2 (v1:metadata.name)
AGIC_POD_NAMESPACE: default (v1:metadata.namespace)
Mounts:
/etc/appgw/azure.json from azure (rw)
/var/run/secrets/kubernetes.io/serviceaccount from bunking-lizard-sa-ingress-azure-token-z9hvt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
azure:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/azure.json
HostPathType: File
bunking-lizard-sa-ingress-azure-token-z9hvt:
Type: Secret (a volume populated by a Secret)
SecretName: bunking-lizard-sa-ingress-azure-token-z9hvt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m12s default-scheduler Successfully assigned default/bunking-lizard-ingress-azure-7b54978988-rs8h2 to aks-agentpool-76661633-1
Normal Pulling 67s (x2 over 2m11s) kubelet, aks-agentpool-76661633-1 Pulling image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.0.0"
Normal Pulled 67s (x2 over 2m9s) kubelet, aks-agentpool-76661633-1 Successfully pulled image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.0.0"
Normal Created 67s (x2 over 2m6s) kubelet, aks-agentpool-76661633-1 Created container ingress-azure
Normal Killing 67s kubelet, aks-agentpool-76661633-1 Container ingress-azure failed liveness probe, will be restarted
Normal Started 66s (x2 over 2m6s) kubelet, aks-agentpool-76661633-1 Started container ingress-azure
Warning Unhealthy 13s (x10 over 113s) kubelet, aks-agentpool-76661633-1 Readiness probe failed: Get http://10.2.0.15:8123/health/ready: dial tcp 10.2.0.15:8123: connect: connection refused
Warning Unhealthy 7s (x6 over 107s) kubelet, aks-agentpool-76661633-1 Liveness probe failed: Get http://10.2.0.15:8123/health/alive: dial tcp 10.2.0.15:8123: connec
- Output of `kubectl logs <ingress controller>.
kubectl logs bunking-lizard-ingress-azure-7b54978988-rs8h2
ERROR: logging before flag.Parse: I1118 23:42:59.206842 1 main.go:302] Using verbosity level 5 from environment variable APPGW_VERBOSITY_LEVEL
I1118 23:42:59.256720 1 environment.go:168] KUBERNETES_WATCHNAMESPACE is not set. Watching all available namespaces.
I1118 23:42:59.256746 1 main.go:132] App Gateway Details: Subscription: mysubsxxxxxxxxxxxxxxxxxxxxxx, Resource Group: MYRESOURCEG, Name: MYAKSCLUSTER
I1118 23:42:59.256752 1 auth.go:90] Creating authorizer from Azure Managed Service Identity
* Any Azure support tickets associated with this issue.
Note. We are using kubernetes version 1.15.5 (preview in Azure) Could this be related?
- Any Azure support tickets associated with this issue. 119111924000415
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 19 (5 by maintainers)
We are facing the same issue.
To Reproduce
We are installing the Azure App Gateway Ingress Controller as following:
The applied helm-chart values:
Describe the bug
kubectl describe
of the pod shows the Liveness and Readiness probe failing in the same manner as described by the author in the first postCurrent workaround
We currently worked around by manually editing the deployment of the azure-app-gateway-ingress-controller and removing the Readiness and Liveness probes there. Afterwards the azure-app-gateway-ingress-controller pod is able to spin up correctly and stays stable in the Running status.
kubectl edit deployment azure-app-gateway-ingress-controller
-> remove the complete Readiness and Liveness section, so that it looks somehow like this:Further Question
Is it somehow possible to archive a running azure-app-gateway-ingress-controller without the need of manually editing the deployment of it after it was deployed with the helm-chart?
The workaround to remove Liveness Probe and Readiness Probe from the Deployment, as suggested by @gitflo1, seems to be doing the trick.
Blocks removed from AGIC’s Deployment:
Causes the running AGIC pod to be replaced by another new one, as expected.
Checking the new pod logs:
AGIC pod is able to reach Application Gateway now, and update its properties…
Same issue here, because the Liveness probe failed
@akshaysngupta thank you for resolving this issue with the RC 1.0.1-rc1. I just tested the RC helm-chart version on our prior failing AKS clusters and the issue was resolved.
@gitflo1 Thanks for asking.
this is new issue, as we already deployed in last weeks at least 4 o 5 AKS clusters with AGIC lattest version 1.0.0, and pods started running successfully.-