azure-functions-durable-extension: Taking minutes for OrchestrationTrigger to start running

Description

I’m seeing rather slow execution of orchestration triggers, from the point it gets created with StartNewAsync.

Expected behavior

More reasonable delays (on the order of seconds, not minutes) to start orchestrations.

Actual behavior

This call was made at 6/5/2022, 7:02:14.864 PM (from App Insights)

                await client.SignalEntityAsync<S>(
                    state.Identifier,
                    entity => entity.Create(t));

                string orchestrationName = $"Create{typeof(T).Name}Orchestration";

                await client.StartNewAsync(orchestrationName, t);

In logs I see this which makes sense:

6/5/2022, 7:02:14.926 PM 2bcc9063a3404c2ca38015ed7eed6a34: Updated Instances table and set the runtime status to ‘Pending’

durable-functions-3

But it takes over 7min to get to this next point, where the orchestration actually starts.

6/5/2022, 7:09:47.981 PM 2bcc9063a3404c2ca38015ed7eed6a34: Fetched [ExecutionStarted] message from knowledge-control-07 (delay = 453979ms)

I see it get into my OrchestrationTrigger at 6/5/2022, 7:09:48.244 PM

durable-functions-2

The system isn’t heavily loaded, since it’s just a perf testing environment, and I’m the only one on it. I had one other Durable Entity instance running in the same App Service Plan.

NOTE: I’m using a long workItemQueueVisibilityTimeout, since we have some long-running image processing activities, and without setting this longer, I was seeing repeated calls to my activity trigger every 5min.

Relevant source code snippets

        "durableTask": {
            "hubName": "knowledge",
            "extendedSessionsEnabled": true,
            "extendedSessionIdleTimeoutInSeconds": 300,
            "storageProvider": {
                "partitionCount": 8,
                "maxQueuePollingInterval": "00:10:00",
                "useLegacyPartitionManagement": false,
                "workItemQueueVisibilityTimeout": "01:00:00"
            }
        },

NOTE: Running same operation again, it only took 2 min to go from Pending to Running, with orchestration d6a86cc865694109aa09ad02beca564c. Still longer than I’d expect, but better than 7min.

Known workarounds

Tried using partitionCount of 8, and lowering maxQueuePollingInterval to 10sec to speed up the time to invoke our orchestrations.

App Details

  • Durable Functions extension version (e.g. v1.8.3): Microsoft.Azure.WebJobs.Extensions.DurableTask 2.6.0
  • Azure Functions runtime version (1.0 or 2.0): Microsoft.NET.Sdk.Functions 4.1.1
  • Programming language used: C#, .NET 6.0

Screenshots

If applicable, add screenshots to help explain your problem. durable-functions

If deployed to Azure

We have access to a lot of telemetry that can help with investigations. Please provide as much of the following information as you can to help us investigate!

  • Timeframe issue observed: 7:02:14.926 PM PDT
  • Function App name: unstruk-knowledge-perf-functions-feed
  • Function name(s): CreateFeedOrchestration
  • Azure region: South Central US
  • Orchestration instance ID(s): 2bcc9063a3404c2ca38015ed7eed6a34, d6a86cc865694109aa09ad02beca564c
  • Azure storage account name: unstrukknowledgfeedsperf

If you don’t want to share your Function App or storage account name GitHub, please at least share the orchestration instance ID. Otherwise it’s extremely difficult to look up information.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 19 (4 by maintainers)

Most upvoted comments

I see, @fiddlerpianist.

Yes, I’m investigating your other thread as well and trying to see if it’s correlated with what we’re seeing here. Let me get back to you on your other thread on what we’re seeing telemetry-wise. I’m now looking at your logs.

I’m deploying with azure-functions-core-tools and using azure-functions-python, python durable functions and getting similar behavior. Timer goes off on time, but orchestratorfunction may not start for 10-15 minutes after. Activity trigger seems to work fine. Happening about 15% of the time.