azure-signalr: Timeout when add to group

Hi, We have a Web API (hosted in azure using app service plan) configured with ASRS. At times, we are experiencing timeouts when add connection into a group. Here is our hub method to add to group.

public async Task AddToGroup(string groupName) { await Groups.AddToGroupAsync(Context.ConnectionId, groupName); }

Here is the exception message “Ack-able message Microsoft.Azure.SignalR.Protocol.JoinGroupWithAckMessage waiting for ack timed out.” and trace System.TimeoutException: at Microsoft.Azure.SignalR.ServiceConnectionContainerBase+<WriteAckableMessageAsync>d__49.MoveNext (Microsoft.Azure.SignalR.Common, Version=1.0.14.0, Culture=neutral, PublicKeyToken=adb9793829ddae60) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at Microsoft.Azure.SignalR.MultiEndpointServiceConnectionContainer+<WriteAckableMessageAsync>d__18.MoveNext (Microsoft.Azure.SignalR.Common, Version=1.0.14.0, Culture=neutral, PublicKeyToken=adb9793829ddae60) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at BDL.TwilioAPI.Hub.CallStatusHub+<AddToGroup>d__0.MoveNext (TwilioAPI, Version=1.0.0.0, Culture=neutral, PublicKeyToken=nullTwilioAPI, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null: D:\a\1\s\TwilioAPI\Hub\CallStatusHub.csTwilioAPI, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null: 9) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at Microsoft.AspNetCore.SignalR.Internal.DefaultHubDispatcher1+<ExecuteHubMethod>d__15.MoveNext (Microsoft.AspNetCore.SignalR.Core, Version=1.0.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e) at Microsoft.AspNetCore.SignalR.Internal.DefaultHubDispatcher1+<Invoke>d__13.MoveNext (Microsoft.AspNetCore.SignalR.Core, Version=1.0.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)

What would be the cause and how can we fix this? Our clients facing much of inconvenience in retrying. The ASRS is configured with “Classic” mode and Standard pricing tier. The ASRS metrics were alright in terms on server and client connections when these timeout occurs.

We are using azure-signalr 1.0.14 version.

What would be the cause for this issue and is there any workaround that can help us?

Regards, Tresa

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 41 (17 by maintainers)

Most upvoted comments

We released 1.13.0 containing the fix.

I was never able to reliable reproduce this in my environment and it was pretty rare (last received this 1 month ago). This seems to support the idea this is caused by a race condition. Yes we do have multiple service endpoints.

Fix tested & confirmed. Thank you!

@vicancy I just gave it a try and seems like the issue is gone in 1.13.0-preview1-10900 will give it more testing tomorrow, thanks for looking into this, great work 👍

Here’s an isolated test case that causes the error on 1.6.1 and not on 1.8.1

    [Authorize]
    public class NominationsHub : Hub
    {
        private readonly ILogger<NominationsHub> _logger;
        public NominationsHub(ILogger<NominationsHub> logger)
        {
            _logger = logger;
        }
        public override async Task OnConnectedAsync()
        {

            await AddToSignalRGroups();
            await base.OnConnectedAsync();
        }
        public override async Task OnDisconnectedAsync(Exception exception)
        {
            await RemoveFromSignalRGroups();
            await base.OnDisconnectedAsync(exception);
        }


        private async Task AddToSignalRGroups()
        {
            await Groups.AddToGroupAsync(Context.ConnectionId, "TERMINAL1");
            await Groups.AddToGroupAsync(Context.ConnectionId, "TERMINAL2");
            await Groups.AddToGroupAsync(Context.ConnectionId, "TERMINAL3");
            await Groups.AddToGroupAsync(Context.ConnectionId, "TERMINAL4");
            await Groups.AddToGroupAsync(Context.ConnectionId, "TERMINAL5");
            await Groups.AddToGroupAsync(Context.ConnectionId, "TERMINAL6");
            await Groups.AddToGroupAsync(Context.ConnectionId, "TERMINAL7");

            _logger.LogInformation($"User {Context.ConnectionId} was added to SignalR terminals");
        }

        private async Task RemoveFromSignalRGroups()
        {
            await Groups.RemoveFromGroupAsync(Context.ConnectionId, "TERMINAL1");
            await Groups.RemoveFromGroupAsync(Context.ConnectionId, "TERMINAL2");
            await Groups.RemoveFromGroupAsync(Context.ConnectionId, "TERMINAL3");
            await Groups.RemoveFromGroupAsync(Context.ConnectionId, "TERMINAL4");
            await Groups.RemoveFromGroupAsync(Context.ConnectionId, "TERMINAL5");
            await Groups.RemoveFromGroupAsync(Context.ConnectionId, "TERMINAL6");
            await Groups.RemoveFromGroupAsync(Context.ConnectionId, "TERMINAL7");

            _logger.LogInformation($"User {Context.ConnectionId} was removed from SignalR terminals.");
        }
    }

Error:

[2021-05-07 10:18:35Z WRN] Failed to send message null.
System.TimeoutException: Ack-able message Microsoft.Azure.SignalR.Protocol.JoinGroupWithAckMessage waiting for ack timed out.
at Microsoft.Azure.SignalR.ServiceConnectionContainerBase.WriteAckableMessageAsync(ServiceMessage serviceMessage, CancellationToken cancellationToken)
at Microsoft.Azure.SignalR.MultiEndpointMessageWriter.<>c__DisplayClass8_0.<<WriteAckableMessageAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.Azure.SignalR.MultiEndpointMessageWriter.<>c__DisplayClass9_0.<<WriteMultiEndpointMessageAsync>b__2>d.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.Azure.SignalR.MultiEndpointMessageWriter.WriteAckableMessageAsync(ServiceMessage serviceMessage, CancellationToken cancellationToken)
at Microsoft.Azure.SignalR.ServiceLifetimeManagerBase`1.WriteAckableCoreAsync[T](T message, Func`2 task)
[2021-05-07 10:18:35Z ERR] Error when dispatching 'OnConnectedAsync' on hub.
System.TimeoutException: Ack-able message Microsoft.Azure.SignalR.Protocol.JoinGroupWithAckMessage waiting for ack timed out.
at Microsoft.Azure.SignalR.ServiceConnectionContainerBase.WriteAckableMessageAsync(ServiceMessage serviceMessage, CancellationToken cancellationToken)
at Microsoft.Azure.SignalR.MultiEndpointMessageWriter.<>c__DisplayClass8_0.<<WriteAckableMessageAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.Azure.SignalR.MultiEndpointMessageWriter.<>c__DisplayClass9_0.<<WriteMultiEndpointMessageAsync>b__2>d.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.Azure.SignalR.MultiEndpointMessageWriter.WriteAckableMessageAsync(ServiceMessage serviceMessage, CancellationToken cancellationToken)
at Microsoft.Azure.SignalR.ServiceLifetimeManagerBase`1.WriteAckableCoreAsync[T](T message, Func`2 task)
at SI.Nominations.Web.Api.Hubs.NominationsHub.AddToSignalRGroups(User user) in /src/SI.Nominations.Web.Api/Hubs/NominationsHub.cs:line 33
at SI.Nominations.Web.Api.Hubs.NominationsHub.OnConnectedAsync() in /src/SI.Nominations.Web.Api/Hubs/NominationsHub.cs:line 21
at Microsoft.AspNetCore.SignalR.Internal.DefaultHubDispatcher`1.OnConnectedAsync(HubConnectionContext connection)
at Microsoft.AspNetCore.SignalR.Internal.DefaultHubDispatcher`1.OnConnectedAsync(HubConnectionContext connection)
at Microsoft.AspNetCore.SignalR.HubConnectionHandler`1.RunHubAsync(HubConnectionContext connection)

Base docker image is mcr.microsoft.com/dotnet/aspnet:5.0-alpine3.12. Running on AKS in EU-West

I started seeing these errors right after I switched my solution from dotnet core 3.1 to .net 5. I also upgraded Microsoft.Azure.SignalR from 1.6.1 to 1.8.0. Turned out, when I downgrade the package version back to 1.6.1 it starts working again. I did not try any version between the two though. So my solution was to go with 1.6.1 as it works fine.

I’m also seeing this same timeout when JoinGroupWithAckMessage is called. Seems to happen when there is lots of traffic on our hub server. I’ve tried adding more app server instances and scaling up ASRS. But no luck. We’re using Microsoft.Azure.SignalR 1.5.1 and .NET Core 3.1. Any help or guidance on debugging would be greatly appreciated.

@ajbeaven The fix for original issue is done. The reason for this kind of exception can be various. Would you share us(jixin[at]microsoft.com) further information about time, resourceId about your case?