azure-functions-durable-extension: Orchestration function hangs on Pending state

Description

Sometimes orchestration function can stuck in ‘Pending’ status. I guess it happens because starting function and setting status is not transnational operation (#520 ). The issue (#520 ) was closed on Nov 22, 2018, but I was able to reproduce it a month ago.

Situation becomes more dramatic if we have code like below (when a new instance should be started only when previous instance completed):

        [FunctionName(nameof(CheckAvailableSqlBatchesTimer))]
        public async Task Run(
            [TimerTrigger("%DataLakeMigrationSchedule%")] TimerInfo timer,
            [OrchestrationClient] DurableOrchestrationClientBase client)
        {
            const string instanceId = nameof(MigrateToDataLakeOrchestrator);
            var instance = await client.GetStatusAsync(instanceId);

            if (instance == null || !(instance.RuntimeStatus == OrchestrationRuntimeStatus.Pending ||
                instance.RuntimeStatus == OrchestrationRuntimeStatus.Running ||
                instance.RuntimeStatus == OrchestrationRuntimeStatus.ContinuedAsNew))
            {
                await client.StartNewAsync(
                    nameof(MigrateToDataLakeOrchestrator),
                    instanceId,
                    null);
            }
            else
            {
                _telemetry.TrackTrace(
                    $"{nameof(MigrateToDataLakeOrchestrator)} function execution was skipped, " +
                    "as there is another instance running.");
            }
        }

In that case a new orchestration instance will never started because previous one stuck in Pending state.

Support information:

Durable Functions extension version: 1.8.3 Function App version: 2.0 Programming language used: C#

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 18

Most upvoted comments

Yeah, I think a small delay will mitigate this. Something like Thread.Sleep(TimeSpan.FromSeconds(1)) at the end of your orchestrator function should be plenty.

Note that we normally tell people not to use Thread.Sleep in orchestrator functions, but there is no negative side effect if you do it right before the orchestrator function completes.

cgillum on Dec 10, 2019

Ah, I looked this again and I think I figured out what’s happening.

This is indeed a race condition. We handle the case where a new orchestration is created from scratch. However, we don’t correctly handle the race condition where an existing orchestration instance is being overwritten by a new one (which appears to be the case for you)! Unfortunately my previous suggestion will not help you - the issue exists in the most up-to-date versions of Microsoft.Azure.DurableTask.AzureStorage.

I will look into fixing this.

cgillum on Dec 10, 2019

I’ve found more on dev environment:

InstanceId: MigrateToDataLakeOrchestrator Region: NorthEurope Function app name: faneurfalsbx13rlixqvwtmo Timeframe issue observed (UTC): 2019-09-28T02:26:00.010Z - 2019-09-28T02:28:00.010Z

Records from DurableFunctionsHubInstances table:

PartitionKey	CreatedTime	Execution Id	Input	LastUpdateTime	Name	RuntimeStatus	TimeStamp	TaskHubName
MigrateToDataLakeOrchestrator	2019-09-28T02:27:00.251Z	null	null	2019-09-28T02:27:00.035Z	MigrateToDataLakeOrchestrator	Pending	2019-09-28T02:27:00.010Z	DurableFunctionsHub

===

InstanceId: MigrateToSdmOrchestrator Region: NorthEurope Function app name: faneurfalsbx13rlixqvwtmo Timeframe issue observed (UTC): 2019-09-07T06:09:00.183Z - 2019-09-07T06:11:00.183Z

Records from DurableFunctionsHubInstances table:

PartitionKey	CreatedTime	Execution Id	Input	LastUpdateTime	Name	RuntimeStatus	TimeStamp	TaskHubName
MigrateToSdmOrchestrator	2019-09-07T06:10:00.481Z	null	null	2019-09-07T06:10:00.265Z	MigrateToSdmOrchestrator	Pending	2019-09-07T06:10:00.481Z	DurableFunctionsHub

spzSource on Nov 5, 2019