application-gateway-kubernetes-ingress: Failed to refresh the Token

Describe the bug The ingress controller is running for almost 40 days without restarts and issues. Then, a few days ago, without apparent reason fails to refresh the token. Other resources around it, such as mic and nmi did not change, managed identity did not change in the meantime… Note that managed identity has a Reader role on a resource group, where the gateway is located.

I haven’t tried to recreate a pod, because I would like to find the root cause first. I have same setup on the production cluster, and I am afraid it can happen there and break my applications.

Does anyone have an idea what might went wrong, and where to look?

To Reproduce Not sure how to reproduce

Ingress Controller details

Output of kubectl describe pod <ingress controller>

Name:               fantastic-waterbuffalo-ingress-azure-66b968bbbc-wds6z
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               aks-agentpool-15375443-6/10.0.0.159
Start Time:         Fri, 25 Oct 2019 14:02:06 +0200
Labels:             aadpodidbinding=fantastic-waterbuffalo-ingress-azure
                    app=ingress-azure
                    pod-template-hash=66b968bbbc
                    release=fantastic-waterbuffalo
Annotations:        <none>
Status:             Running
IP:                 10.0.0.172
Controlled By:      ReplicaSet/fantastic-waterbuffalo-ingress-azure-66b968bbbc
Containers:
  ingress-azure:
    Container ID:   docker://553e5262ac6537629f4a90aaf26b38648a8ba287938df454b044c31b84f7d820
    Image:          mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:0.10.0-rc4
    Image ID:       docker-pullable://mcr.microsoft.com/azure-application-gateway/kubernetes-ingress@sha256:4579e970084e58ce84f85e783c2e57e2e38fbf22b4204076bc80f7e464475917
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 25 Oct 2019 14:02:30 +0200
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:8123/health/alive delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:      http-get http://:8123/health/ready delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      fantastic-waterbuffalo-cm-ingress-azure  ConfigMap  Optional: false
    Environment:
      AZURE_CONTEXT_LOCATION:        /etc/appgw/azure.json
      AGIC_POD_NAME:                 fantastic-waterbuffalo-ingress-azure-66b968bbbc-wds6z (v1:metadata.name)
      AGIC_POD_NAMESPACE:            default (v1:metadata.namespace)
      KUBERNETES_PORT_443_TCP_ADDR:  xxxxxxx
      KUBERNETES_PORT:               xxxxxx
      KUBERNETES_PORT_443_TCP:       xxxxx
      KUBERNETES_SERVICE_HOST:       xxxx
    Mounts:
      /etc/appgw/azure.json from azure (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from fantastic-waterbuffalo-sa-ingress-azure-token-5gwd7 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  azure:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/azure.json
    HostPathType:  File
  fantastic-waterbuffalo-sa-ingress-azure-token-5gwd7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  fantastic-waterbuffalo-sa-ingress-azure-token-5gwd7
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

Output of `kubectl logs <ingress controller>.

]
E1210 12:27:27.393881       1 worker.go:49] Error mutating AKS from k8s event. unable to get specified AppGateway (CTRL001)
E1210 12:27:27.529038       1 mutate_app_gateway.go:34] unable to get specified AppGateway [xxx-xx-xx], check AppGateway identifier, error=[azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/xxx-xx-xx/resourceGroups/xxx-xx-xx/providers/Microsoft.Network/applicationGateways/xxx-xx-xx?api-version=2019-06-01: StatusCode=403 -- Original Error: adal: Refresh request failed. Status Code = '403'. Response body: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"}
]
E1210 12:27:27.529097       1 worker.go:53] Error mutating App Gateway config from k8s event. unable to get specified AppGateway (CTRL001)
E1210 12:27:32.687335       1 mutate_app_gateway.go:34] unable to get specified AppGateway [xxx-xx-xx], check AppGateway identifier, error=[azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/xxx-xx-xx/resourceGroups/xxx-xx-xx/providers/Microsoft.Network/applicationGateways/xxx-xx-xx?api-version=2019-06-01: StatusCode=403 -- Original Error: adal: Refresh request failed. Status Code = '403'. Response body: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"}
]
E1210 12:27:32.687756       1 worker.go:49] Error mutating AKS from k8s event. unable to get specified AppGateway (CTRL001)
E1210 12:27:32.947537       1 mutate_app_gateway.go:34] unable to get specified AppGateway [xxx-xx-xx], check AppGateway identifier, error=[azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/xxx-xx-xx/resourceGroups/xxx-xx-xx/providers/Microsoft.Network/applicationGateways/xxx-xx-xx?api-version=2019-06-01: StatusCode=403 -- Original Error: adal: Refresh request failed. Status Code = '403'. Response body: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"}
]
E1210 12:27:32.947736       1 worker.go:53] Error mutating App Gateway config from k8s event. unable to get specified AppGateway (CTRL001)```

Any Azure support tickets associated with this issue. maybe related https://github.com/Azure/application-gateway-kubernetes-ingress/issues/117

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 16 (12 by maintainers)

Most upvoted comments

Not very satisfying resolution of the ticket… i still dont know what to do…

dirien on Apr 9, 2021

Hello, I’m having the same problem after re-create my storage accounts, how can I fix caching issues?

qzhou-hmcts on Mar 23, 2020

@aleksmark this looks like an issue in either AAD Pod identity or IMDS (instance metadata service) that is responsible for responding to token requests. I am investigating this further with the identity team. Will soon provide an update.

akshaysngupta on Dec 12, 2019