azure-functions-durable-extension: Orchestration function hangs on Pending state
Description
Sometimes orchestration function can stuck in ‘Pending’ status. I guess it happens because starting function and setting status is not transnational operation (#520 ). The issue (#520 ) was closed on Nov 22, 2018, but I was able to reproduce it a month ago.
Situation becomes more dramatic if we have code like below (when a new instance should be started only when previous instance completed):
[FunctionName(nameof(CheckAvailableSqlBatchesTimer))]
public async Task Run(
[TimerTrigger("%DataLakeMigrationSchedule%")] TimerInfo timer,
[OrchestrationClient] DurableOrchestrationClientBase client)
{
const string instanceId = nameof(MigrateToDataLakeOrchestrator);
var instance = await client.GetStatusAsync(instanceId);
if (instance == null || !(instance.RuntimeStatus == OrchestrationRuntimeStatus.Pending ||
instance.RuntimeStatus == OrchestrationRuntimeStatus.Running ||
instance.RuntimeStatus == OrchestrationRuntimeStatus.ContinuedAsNew))
{
await client.StartNewAsync(
nameof(MigrateToDataLakeOrchestrator),
instanceId,
null);
}
else
{
_telemetry.TrackTrace(
$"{nameof(MigrateToDataLakeOrchestrator)} function execution was skipped, " +
"as there is another instance running.");
}
}
In that case a new orchestration instance will never started because previous one stuck in Pending state.
Support information:
Durable Functions extension version: 1.8.3 Function App version: 2.0 Programming language used: C#
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18
Yeah, I think a small delay will mitigate this. Something like
Thread.Sleep(TimeSpan.FromSeconds(1))
at the end of your orchestrator function should be plenty.Note that we normally tell people not to use
Thread.Sleep
in orchestrator functions, but there is no negative side effect if you do it right before the orchestrator function completes.Ah, I looked this again and I think I figured out what’s happening.
This is indeed a race condition. We handle the case where a new orchestration is created from scratch. However, we don’t correctly handle the race condition where an existing orchestration instance is being overwritten by a new one (which appears to be the case for you)! Unfortunately my previous suggestion will not help you - the issue exists in the most up-to-date versions of Microsoft.Azure.DurableTask.AzureStorage.
I will look into fixing this.
I’ve found more on dev environment:
InstanceId: MigrateToDataLakeOrchestrator Region: NorthEurope Function app name: faneurfalsbx13rlixqvwtmo Timeframe issue observed (UTC): 2019-09-28T02:26:00.010Z - 2019-09-28T02:28:00.010Z
Records from DurableFunctionsHubInstances table:
===
InstanceId: MigrateToSdmOrchestrator Region: NorthEurope Function app name: faneurfalsbx13rlixqvwtmo Timeframe issue observed (UTC): 2019-09-07T06:09:00.183Z - 2019-09-07T06:11:00.183Z
Records from DurableFunctionsHubInstances table: