azure-functions-durable-extension: Orchestration function hangs on Pending state

Description

Sometimes orchestration function can stuck in ‘Pending’ status. I guess it happens because starting function and setting status is not transnational operation (#520 ). The issue (#520 ) was closed on Nov 22, 2018, but I was able to reproduce it a month ago.

Situation becomes more dramatic if we have code like below (when a new instance should be started only when previous instance completed):

        [FunctionName(nameof(CheckAvailableSqlBatchesTimer))]
        public async Task Run(
            [TimerTrigger("%DataLakeMigrationSchedule%")] TimerInfo timer,
            [OrchestrationClient] DurableOrchestrationClientBase client)
        {
            const string instanceId = nameof(MigrateToDataLakeOrchestrator);
            var instance = await client.GetStatusAsync(instanceId);

            if (instance == null || !(instance.RuntimeStatus == OrchestrationRuntimeStatus.Pending ||
                instance.RuntimeStatus == OrchestrationRuntimeStatus.Running ||
                instance.RuntimeStatus == OrchestrationRuntimeStatus.ContinuedAsNew))
            {
                await client.StartNewAsync(
                    nameof(MigrateToDataLakeOrchestrator),
                    instanceId,
                    null);
            }
            else
            {
                _telemetry.TrackTrace(
                    $"{nameof(MigrateToDataLakeOrchestrator)} function execution was skipped, " +
                    "as there is another instance running.");
            }
        }

In that case a new orchestration instance will never started because previous one stuck in Pending state.

Support information:

Durable Functions extension version: 1.8.3 Function App version: 2.0 Programming language used: C#

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18

Most upvoted comments

Yeah, I think a small delay will mitigate this. Something like Thread.Sleep(TimeSpan.FromSeconds(1)) at the end of your orchestrator function should be plenty.

Note that we normally tell people not to use Thread.Sleep in orchestrator functions, but there is no negative side effect if you do it right before the orchestrator function completes.

Ah, I looked this again and I think I figured out what’s happening.

This is indeed a race condition. We handle the case where a new orchestration is created from scratch. However, we don’t correctly handle the race condition where an existing orchestration instance is being overwritten by a new one (which appears to be the case for you)! Unfortunately my previous suggestion will not help you - the issue exists in the most up-to-date versions of Microsoft.Azure.DurableTask.AzureStorage.

I will look into fixing this.

I’ve found more on dev environment:

InstanceId: MigrateToDataLakeOrchestrator Region: NorthEurope Function app name: faneurfalsbx13rlixqvwtmo Timeframe issue observed (UTC): 2019-09-28T02:26:00.010Z - 2019-09-28T02:28:00.010Z

Records from DurableFunctionsHubInstances table:

PartitionKey CreatedTime Execution Id Input LastUpdateTime Name RuntimeStatus TimeStamp TaskHubName
MigrateToDataLakeOrchestrator 2019-09-28T02:27:00.251Z null null 2019-09-28T02:27:00.035Z MigrateToDataLakeOrchestrator Pending 2019-09-28T02:27:00.010Z DurableFunctionsHub

===

InstanceId: MigrateToSdmOrchestrator Region: NorthEurope Function app name: faneurfalsbx13rlixqvwtmo Timeframe issue observed (UTC): 2019-09-07T06:09:00.183Z - 2019-09-07T06:11:00.183Z

Records from DurableFunctionsHubInstances table:

PartitionKey CreatedTime Execution Id Input LastUpdateTime Name RuntimeStatus TimeStamp TaskHubName
MigrateToSdmOrchestrator 2019-09-07T06:10:00.481Z null null 2019-09-07T06:10:00.265Z MigrateToSdmOrchestrator Pending 2019-09-07T06:10:00.481Z DurableFunctionsHub