azure-functions-durable-extension: Durable function fails with `TaskCanceledException` and never gets retried
We have recently upgraded from Net6 In-Process to Net7 Isolated functions and since this upgrade we are seeing intermittent TaskCanceledException across multiple functions doing a variety of different work so no common/consistent reason we can find. It does not appear to ever hit our code and the exception is happening before the function is invoked within the WorkerFunctionInvoker.
One example below but we can provide more:
- Timestamp: 2023-03-28T09:20:19.6220000Z
- Function App version: v4
- Function App name: -
- Function name(s) (as appropriate): -
- Invocation ID: c7ea2e1e-1e12-4d0b-ab60-fe3b75a92467
- Region: Central US
Additional Invocation Ids from last 24 hours on a sandbox we are running: a0a7c75d-e997-4361-aa3b-a4c5501d0ec7 c6b083bf-60ab-4cb1-9ab6-8ca5f883c388
Stack
System.Threading.Tasks.TaskCanceledException:
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.WebJobs.Script.Description.WorkerFunctionInvoker+<InvokeCore>d__9.MoveNext (Microsoft.Azure.WebJobs.Script, Version=4.16.0.0, Culture=neutral, PublicKeyToken=null: /_/src/WebJobs.Script/Description/Workers/WorkerFunctionInvoker.cs:96)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.WebJobs.Script.Description.FunctionInvokerBase+<Invoke>d__24.MoveNext (Microsoft.Azure.WebJobs.Script, Version=4.16.0.0, Culture=neutral, PublicKeyToken=null: /_/src/WebJobs.Script/Description/FunctionInvokerBase.cs:82)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.WebJobs.Script.Description.FunctionGenerator+<Coerce>d__3`1.MoveNext (Microsoft.Azure.WebJobs.Script, Version=4.16.0.0, Culture=neutral, PublicKeyToken=null: /_/src/WebJobs.Script/Description/FunctionGenerator.cs:225)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker`2+<InvokeAsync>d__10.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.36.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionInvoker.cs:52)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.WebJobs.Extensions.DurableTask.OutOfProcMiddleware+<>c__DisplayClass10_0+<<CallOrchestratorAsync>b__0>d.MoveNext (Microsoft.Azure.WebJobs.Extensions.DurableTask, Version=2.0.0.0, Culture=neutral, PublicKeyToken=014045d636e89289: D:\a\_work\1\s\src\WebJobs.Extensions.DurableTask\OutOfProcMiddleware.cs:124)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.WebJobs.Host.Executors.TriggeredFunctionExecutor`1+<>c__DisplayClass7_0+<<TryExecuteAsync>b__0>d.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.36.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\TriggeredFunctionExecutor.cs:50)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<InvokeWithTimeoutAsync>d__33.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.36.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:581)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithWatchersAsync>d__32.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.36.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:527)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithLoggingAsync>d__26.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.36.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:306)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithLoggingAsync>d__26.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.36.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:352)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<TryExecuteAsync>d__18.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.36.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:108)
Repro steps
We cannot reproduce this consistently, the same function will work without issue 100 times then for no reason we will get a TaskCancelledException.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 21 (10 by maintainers)
Hi, we are upgrading two large durable function projects (5k+ orchestrations,150k+ function executions total daily) from in-process 3.1 to isolated 7.
We believe we are hitting this issue in our dev environments while testing and are investigating further. Happy to provide further info if needed. Thanks
Thanks for the detailed response!
For most of our durable functions this isn’t a problem as we are primarily using them to work down a persisted queue or generate denormalized data so we run them again without issue, there are a few exceptions where this isn’t true and causing us a problem but we can work around in the short term.
Regarding the TaskCancelledException my only concern is that these errors are appearing in AppInsights which is what led to us investigating these issues.
The two scenarios that are affecting us are:
Ideally, if the TaskCancelledException is happening internally in the host code we’d prefer never to see it or have to handle it as when it occurs and is logged in AppInsights we investigate, triage, etc. I’d be interested to know if you think we should be factoring this exception scenario in and specifically coding for it if it happens on an Activity or do you think this is an error we should never be seeing in AppInsights or handling ourselves.
Many thanks
Richard
I believe I understand what is going on. This appears to be a bug with the durable extension. Well, a misunderstanding of how function tirggering works. We were relying on the invocation cancellation exception to bubble out and we catch then abort this invocation. However, it does not appear to bubble out and instead surfaces as a function invocation result with a failed status - so we assume this invocation did finish.
@liliankasem something that we can consider is to review this behavior. We can sync offline and I’ll go over what I have found.
For the meantime, I am going to transfer this issue to the durable extension repo and address it there.
Thanks @jviau
I might need a little help understanding that. We are not referencing Microsoft.Azure.WebJobs.Extensions.DurableTask we are only referencing Microsoft.Azure.Functions.Worker.Extensions.DurableTask in isolated, this hasn’t had a release since 1.0.2 (April 7th 2023). I’m not seeing any dependencies to Microsoft.Azure.WebJobs.Extensions.DurableTask.
As I understand it Microsoft.Azure.WebJobs.Extensions.DurableTask is for the legacy and soon to be deprecated in-process with .net 8 I believe.
We’ve moved all our functions to isolated and removed all legacy packages for in-process and referencing the new ones such as Microsoft.Azure.Functions.Worker.Extensions.DurableTask. This is where we are seeing the problem, we never saw it in in-process on core/net5/6.
Many thanks.
Hello
Apologies for the slow reply!
We having been keeping pace with latest version as released. We are currently on:
Microsoft.Azure.Functions.Worker - 1.13.0 Microsoft.Azure.Functions.Worker.Extensions.DurableTask - 1.0.2
***ShippingOrchestrationFunction is an orchestration instance which is started by a Timer trigger.
These orchestration functions run a queue down so we aren’t too concerned about these failing as the next timer trigger will get it back up and running again and continue to work the queue down. Because of this, we try catch at the orchestration function in this instance and even in the event of an exception we handle and let the orchestration function complete successfully as we know it’ll start a new instance from our timer trigger if one isn’t running.
In the example details I have below the orchestration instance has a status of Failed which can’t happen from our code as we are handling and logging exceptions but not letting them bubble up and fail the orchestration instance.
Hopefully this helps to track down the problem as something before our Orchestration function is invoked has failed the durable task.
App Insights: InvocationId: 1e02dcd1-36e3-4bf3-b154-41acec35ed95 Exception type: System.Threading.Tasks.TaskCanceledException FormattedMessage: Executed ‘Functions.ReconcileTransactionLocationsShippingOrchestrationFunction’ (Failed, Id=1e02dcd1-36e3-4bf3-b154-41acec35ed95, Duration=24ms)
Associated Table Storage Durable Function: Partition Key: 74e3409e5dec5610b457dbfd1a25691d ExecutionId: 16cb566b7a164ca3b656c26c393bf1e3 Created Time: 2023-04-23T02:00:00.4598939Z Runtime Status: Failed Output:
Support Request Id: 2304190050002817
We can provide as many invocation ids as required across multiple environments.
Thanks again for all the responses, it’s appreciated.
Yup!