azure-webjobs-sdk: Improve Queue Trigger\Storage binding to be more resilient when transient issues occur.

The Queue Trigger may timeout after two minutes and this causes a restart of the Host.

E.G

System.TimeoutException : The operation ‘GetMessages’ with id ‘60c56e17-66e8-49a3-b1dc-9ac5b393eabd’ did not complete in ‘00:02:00’. at async Microsoft.Azure.WebJobs.Extensions.Storage.TimeoutHandler.ExecuteWithTimeout[T](String operationName,String clientRequestId,IWebJobsExceptionHandler exceptionHandler,Func`1 operation) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Extensions.Storage\TimeoutHandler.cs : 30

Repro steps

Provide the steps required to reproduce the problem

  1. Create a Timer Trigger to put 5000 messages into a Queue. E.G

using System;

public static void Run(TimerInfo myTimer, ICollector<CustomQueueMessage> outputQueueItem, ILogger log) { log.LogInformation($“C# Timer trigger function executed at: {DateTime.Now}”);

for (int i = 0; i < 5000; i++)
{
    var queueMessage = new CustomQueueMessage {PersonName= i.ToString(), Title= "Mr " + i.ToString()};
    outputQueueItem.Add(queueMessage);
}

}

public class CustomQueueMessage { public string PersonName { get; set; } public string Title { get; set; } }

  1. Step B

Write a Console App or Function to poll this Queue. This needs to be in another region or geographically distant from the storage account.

Sample Code attached.

timeouthandler.zip

Expected behavior

The Queue Trigger should be able to handle the transient issues.

System.IO.IOException HResult=0x80131620 Message=Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond… Source=System.Net.Sockets StackTrace: at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) in /_/src/System.Net.Sockets/src/System/Net/Sockets/Socket.Tasks.cs:line 1107

Inner Exception 1: SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

Actual behavior

The application restarts because of the Timeout handler and a number of customers are seeing this.

Known workarounds

No workarounds. Testing with a new approach in using this code to see if it helps.

            string clientRequestId = Guid.NewGuid().ToString();
            OperationContext context = new OperationContext { ClientRequestID = clientRequestId };
            Console.WriteLine("Entering Loop {0}{1}", i.ToString(),DateTime.Now.ToString());
            using(CancellationTokenSource cts = new CancellationTokenSource()){
                cts.CancelAfter(_cancelTimeout);
                await queue.GetMessagesAsync(32,
                    _visibilityTimeout,
                    options: null,
                    operationContext: context,
                    cancellationToken: cts.Token);
            };

Related information

Provide any related information

  • Package version 3.0.10 and 4.0.0.0-Preview1
  • Links to source

https://github.com/Azure/azure-webjobs-sdk/blob/4130350327c6d637d48456222de7e658c6cf729a/src/Microsoft.Azure.WebJobs.Extensions.Storage/Queues/Listeners/QueueListener.cs#L201

               batch = await TimeoutHandler.ExecuteWithTimeout("GetMessages", context.ClientRequestID, _exceptionHandler, () =>
                {
                    return _queue.GetMessagesAsync(_queueProcessor.BatchSize,
                        _visibilityTimeout,
                        options: null,
                        operationContext: context,
                        cancellationToken: cancellationToken);
                });

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 23 (12 by maintainers)

Most upvoted comments

@v-anvari The improved error handling was added in version 3.0.11 and above of the Storage Extension. https://www.nuget.org/packages/Microsoft.Azure.WebJobs.Extensions.Storage/3.0.11 . Customers seeing this issue first need to migrate to that version or higher and then see if they see the issue resolved. Depending on the geographic location of the Storage account in relation to the Azure Functions App there may still be increased latency.

I removed it.