azure-sdk-for-net: [BUG] After a certain number of timeouts ARM ends up in a bad state where all future calls also timeout
Library name and version
Azure.ResourceManager 1.7.0
Describe the bug
After a certain number of timeouts exception resource manager ends up in a bad state where all subsequent calls will time out even if the far end is available. The application needs to be restarted.
This seems to only happen when multiple ArmClients are created. If we pass in HttpClients to each one and dispose of them when we are done the problem disappears. This seems to have something to do with all Azure SDK clients, by default, sharing a single HttpClient instance.
Expected behavior
Subsequent calls work and are not impacted by previous timeouts.
Actual behavior
All calls time out until the application is restarted.
Here is the exception we get:
Exception: ClientSecretCredential authentication failed: Request to the endpoint timed out. at Azure.Identity.CredentialDiagnosticScope.FailWrapAndThrow(Exception ex, String additionalMessage)
at Azure.Identity.ClientSecretCredential.GetToken(TokenRequestContext requestContext, CancellationToken cancellationToken)
at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.AccessTokenCache.GetHeaderValueFromCredentialAsync(TokenRequestContext context, Boolean async, CancellationToken cancellationToken)
at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.AccessTokenCache.GetHeaderValueAsync(HttpMessage message, TokenRequestContext context, Boolean async)
at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.AccessTokenCache.GetHeaderValueAsync(HttpMessage message, TokenRequestContext context, Boolean async)
at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.AuthorizeRequest(HttpMessage message)
at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory1 pipeline, Boolean async) at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.Process(HttpMessage message, ReadOnlyMemory1 pipeline)
at Azure.Core.Pipeline.HttpPipelinePolicy.ProcessNext(HttpMessage message, ReadOnlyMemory1 pipeline) at Azure.Core.Pipeline.RedirectPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory1 pipeline, Boolean async)
at Azure.Core.Pipeline.RedirectPolicy.Process(HttpMessage message, ReadOnlyMemory1 pipeline) at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory1 pipeline, Boolean async)
at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory1 pipeline, Boolean async) at Azure.Core.Pipeline.RetryPolicy.Process(HttpMessage message, ReadOnlyMemory1 pipeline)
at Azure.Core.Pipeline.HttpPipelineSynchronousPolicy.Process(HttpMessage message, ReadOnlyMemory1 pipeline) at Azure.Core.Pipeline.HttpPipelineSynchronousPolicy.Process(HttpMessage message, ReadOnlyMemory1 pipeline)
at Azure.Core.Pipeline.HttpPipelineSynchronousPolicy.Process(HttpMessage message, ReadOnlyMemory`1 pipeline)
at Azure.Core.Pipeline.HttpPipeline.Send(HttpMessage message, CancellationToken cancellationToken)
at Azure.ResourceManager.Compute.VirtualMachinesRestOperations.InstanceView(String subscriptionId, String resourceGroupName, String vmName, CancellationToken cancellationToken)
at Azure.ResourceManager.Compute.VirtualMachineResource.InstanceView(CancellationToken cancellationToken)
Reproduction Steps
int count = 20;
ArmClient client = null;
while (true)
{
try
{
Console.WriteLine(count);
if (count > 0)
{
count--;
ArmClientOptions armClientOptions = new ArmClientOptions();
if (count > 0)
{
armClientOptions.Retry.NetworkTimeout = TimeSpan.FromMilliseconds(1);
}
client = new ArmClient(new DefaultAzureCredential(), string.Empty, armClientOptions);
}
var vm = client.GetVirtualMachineResource(new ResourceIdentifier(ResourceId));
var view = vm.InstanceView();
Thread.Sleep(TimeSpan.FromSeconds(3));
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
}
Environment
This was run on windows 10 using .net 6 and Visual Studio 17.1.1
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 32 (10 by maintainers)
It seems like using a singleton for ARMClient, ArmClientOptions and HttpClient solved our problems. At least we haven’t had any exceptions or lockups in the last three weeks.