azure-functions-durable-extension: Durable entity gets stuck in scheduled state

Description

Sometimes when a durable entity is signaled from an orchestrator client (EG trigger FWIW) the entity gets stuck in the scheduled status.

Expected behavior

I’d expect that the durable entity status is complete after it was signaled.

Actual behavior

It seems that it is getting stuck at some point when the entity is signaled, not sure if there’s something wrong with my implementation (maybe the async calls or it is swallowing an error?)

Relevant source code snippets

// Durable entity
public interface IOrder
{
    void Identify(OrderIdentifyInput input);

    Task Place(string orderId);

    Task Patch(PatchOrderInput input);

    Task TransitionStatus(TransitionOrderInput input);
}

[JsonObject(MemberSerialization.OptIn)]
public class Order : IOrder
{
    [JsonIgnore]
    private readonly ILogger<Order> _logger;

    [JsonIgnore]
    private readonly ICommandBus _commandBus;

    [JsonIgnore]
    private readonly IOptions<Settings> _settings;

    [JsonProperty("Id")]
    public string Id { get; set; }

    [JsonProperty("status")]
    public OrderStatus Status { get; set; } = OrderStatus.UNDETERMINED;


    [JsonProperty("pendingStatuses")]
    public List<(OrderStatus Status, string Traceparent)> PendingStatuses { get; set; } = new List<(OrderStatus, string)>();


    public Order(ILogger<Order> logger, IOptions<Settings> settings, ICommandBus commandBus)
    {
        _logger = logger;
        _settings = settings;
        _commandBus = commandBus;
    }

    public async Task TransitionStatus(TransitionOrderInput input)
    {
        // some logic to aggregate status in ProcessedStatuses
        ...

        foreach (var status in ProcessedStatuses)
        {
            Status = status.Status; 
            Source = status.Source;
            await SendTransitionOrderStatusCommand(); 
        }
    }

    private async Task SendTransitionOrderStatusCommand()
    {
        ...
        await _commandBus.Send("some-queue", command, options);
    }

    [FunctionName(nameof(Order))]
    public static Task Run([EntityTrigger] IDurableEntityContext ctx)
        => ctx.DispatchAsync<Order>();
}

// client
public static async Task OrderStatusTransitionedHandler([EventGridTrigger] EventGridEvent eventGridEvent, [DurableClient] IDurableEntityClient starter, ILogger log)
{
    var entityId = new EntityId(nameof(Order), "some-id");
    await starter.SignalEntityAsync<IOrder>(entityId, async order =>
    {
// maybe the issue could be here? I haven't seen this syntax (awaiting the method) in the samples, however, I haven't seen an async method in a durable entity in the examples.
        await order.TransitionStatus(new TransitionOrderInput
        {
	    Status = @event.Status
        });
    });
}

Known workarounds

None

App Details

  • Durable Functions extension version (e.g. v1.8.3): Microsoft.Azure.WebJobs.Extensions.DurableTask 2.5.1
  • Azure Functions runtime version (1.0 or 2.0): v3 and netcore3.1
  • Programming language used: C#

Screenshots

Notice how it got stuck in Scheduled status, the previous signal was processed correctly tho. image

If deployed to Azure

We have access to a lot of telemetry that can help with investigations. Please provide as much of the following information as you can to help us investigate!

  • Timeframe issue observed: within 2022-06-24 22:10:25.2738203 and 2022-06-25 04:53:30.5071382
  • Function App name: infra-integrations-order-saga
  • Function name(s):
  • Azure region: westus
  • Orchestration instance ID(s): @order@{"siteUID":"08bacd63-da28-47fb-881e-2c067998a744","checkNumber":"121638997","openedAt":"2022-06-25"}_qsr
  • Azure storage account name: fnappoqvyas2ldjjxc

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 19 (5 by maintainers)

Most upvoted comments

I think Application Insights may have dropped the trace events for some reason. One possible explanation is that it uses sampling by default.

Adding this section to the host.json should disable sampling:

{
  "logging": {
    "applicationInsights": {
      "sampling": {
        "isEnabled": false
      }
    }
  }
}

Let me know if you keep experiencing this problem even with sampling disabled.

BTW can you share the additional logs that you spotted

In our internal traces these are the last few operations executed on this entity: image

I think he’ll be back on Monday (but may need some time to get caught up on all the things he missed). 😃