azure-functions-durable-extension: Orchestrator stuck waiting for activity task completion
Compiled C#
Runtime: v1
Microsoft.Azure.WebJobs.Extensions.DurableTask
: 1.7.0
To locate the instance in West US:
[2018-12-21T19:23:45.567Z] Function started (Id=81976288-5162-412b-98bb-87d2a41019a0)
Example instance ID: 9f6a5c9b-d0cd-4d80-9a57-f66a33a64542
I have tried running the orchestrator a couple of times and it keeps getting hung up in the middle. Based on my investigation it looks like it is awaiting for an activity to complete that was started. While trying to debug it locally, I was able to see in the control queues that at one point a task completion message was in the queue and had been dequeued at that point 97 times. Eventually the message was invisible (though the storage explorer showed there was one hidden) which led me to believe it was hidden with some delay.
This orchestrator structure is fairly simple in what it is doing, in pseudo-code:
fan out for each image:
call UploadImageToStorage activity
call FixImageRotationAndColor activity
fan-in via await Task.WhenAll(tasks) with return values from FixImageRotationAndColor activity
...one other step down here that hasn't been reached
In the case I have been trying the number of images has been 2. From the DurableFunctionsHubHistory
I can see that both executions of UploadImageToStorage
are scheduled then complete. Then I can see both executions of FixImageRotationAndColor
are scheduled but only one completion is listed.
I believe the other execution is being completed as well due to the fact that the output file it creates exists and the aformentioned queue messages in the control queues that contain the result information. Based on the high dequeue count on those messages until they are hidden I am guessing that whatever is handling that queue is failing for whatever reason. I have looked at the message payload to see if anything jumped out as something that would break parsing but I didn’t see anything obvious.
Here is a sample of one of the queue items that seemed to be failing to dequeue (NOTE this is from my local running instance where I was seeing the same behavior):
{"$type":"DurableTask.AzureStorage.MessageData","ActivityId":"0fa59e67-34ad-4b50-987a-adaa95bf76bb","TaskMessage":{"$type":"DurableTask.Core.TaskMessage","Event":{"$type":"DurableTask.Core.History.TaskCompletedEvent","EventType":5,"TaskScheduledId":3,"Result":"{\"FullPath\":\"http://127.0.0.1:10000/devstoreaccount1/docs/images/decoded/b6748c6b-3749-4160-9463-86db1509ec28/2-grayscale.jpg\",\"RelativePath\":\"docs/images/decoded/b6748c6b-3749-4160-9463-86db1509ec28/2-grayscale.jpg\"}","EventId":-1,"IsPlayed":false,"Timestamp":"2018-12-21T19:02:03.7511653Z"},"SequenceNumber":0,"OrchestrationInstance":{"$type":"DurableTask.Core.OrchestrationInstance","InstanceId":"b6748c6b-3749-4160-9463-86db1509ec28","ExecutionId":"abcd6c77584343898cf9d302e3f77eae"}},"CompressedBlobName":null,"SequenceNumber":17,"Episode":2}
Any help would be greatly appreciated and I am happy to provide any additional information that would be useful.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 23 (8 by maintainers)
@cgillum v1.7.1 fixed a similar issue for me. Thanks.
Thanks @ghills for confirming. I’ve pushed the final version of the release to nuget.org: https://github.com/Azure/azure-functions-durable-extension/releases/tag/v1.7.1
Thanks for that info @ghills. Yes, the out-of-order logic I mentioned is new in
1.7.0
because if fixes an issue where orchestrator functions can hang if certain race conditions are met (see https://github.com/Azure/azure-functions-durable-extension/issues/460). The fact that this was working for you fine when using1.5.0
further confirms that you have discovered a bug in this newly added logic - which we will need to fix soon.