azure-sdk-for-net: [BUG] ManagedIdentityCredential fails sometimes in AKS

Describe the bug We are using the ManagedIdentityCredential class to get access tokens for managed identities. We are deploying the application to AKS, where we are using https://github.com/Azure/aad-pod-identity.

The instance of the ManagedIdentityCredential is a singleton.

Sometimes after starting a new pod, we get the exception Azure.Identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. No Managed Identity endpoint found. everytime when the pod is trying to get the access token. If the pod is able to start without this exception, the exception is never observed during the lifetime of the pod.

The problem seems to be that there might be a delay after starting up the pod, after which the IMDS endpoint is available for the pod in AKS. When the pod is trying to get the access token before the endpoint is available, it has some bad state, where it will never be able to recover from.

The cause of the issue is probably this code: https://github.com/Azure/azure-sdk-for-net/blob/705f3296529b2e30e16bfe42fdd2245511a5c0b0/sdk/identity/Azure.Identity/src/ManagedIdentityClient.cs#L51-L67

The ManagedIdentityClient tries several strategies to get a ManagedIdentitySource. If all of them fail, it sets the value of _identitySourceAsyncLock to null and will therefore never try to resolve the ManagedIdentitySource again.

Expected behavior The exception should not occur or it should be possible to recover from this failed state when the IMDS endpoint gets available.

Actual behavior The exception occurs and it is not possible to recover from the failed state.

To Reproduce

  1. Deploy AKS
  2. Deploy the aad-pod-identity services
  3. Deploy a pod using the ManagedIdentityCredentials
  4. Pray for observing the exception
  5. Observe the exception

Environment:

  • AKS 1.17.11
  • AAD Pod Identity 1.7.1
  • Azure.Identity 1.2.3

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (7 by maintainers)

Commits related to this issue

Most upvoted comments

@christothes The scenario I had in mind were using DefaultAzureCredential.

So when I would try the get an access token via an DefaultAzureCredential it would be tried to get a token via, EnvironmentCredential, ManagedIdentityCredential, SharedTokenCredential, and so on.

https://github.com/Azure/azure-sdk-for-net/blob/a79bd104ab8c2cf79e0f9b2c60619de85f101d71/sdk/identity/Azure.Identity/src/DefaultAzureCredential.cs#L169-L220

When you are now introducing a retry mechanism in the ImdsManagedIdentitySource, the ManagedIdentityCredential.GetTokenAsync() method would block until the timeouts and retries of ImdsManagedIdentitySource are exhausted. And this would always happen, even if the IMDS endpoint will never be available, which is quite often the case when using DefaultAzureCredential.