Discord.Net: Discord.Net doesn't automatically reconnect after network disconnect

2018-02-22 14:06:21|INFO|ArchiBoT|OnLog() Discord | Discord.Net v2.0.0-beta (API v6)
(...)
2018-02-22 15:26:11|WARN|ArchiBoT|OnLog() Gateway |
2018-02-22 15:26:11|WARN|ArchiBoT|OnLog() System.Exception: Server missed last heartbeat
   at Discord.ConnectionManager.<>c__DisplayClass28_0.<<StartAsync>b__0>d.MoveNext()
2018-02-22 15:26:11|INFO|ArchiBoT|OnLog() Gateway | Disconnecting
2018-02-22 15:26:12|INFO|ArchiBoT|OnLog() Gateway | Disconnected
2018-02-22 15:26:13|INFO|ArchiBoT|OnLog() Gateway | Connecting
2018-02-22 15:27:58|INFO|ArchiBoT|OnLog() Gateway | Disconnecting
2018-02-22 15:27:58|INFO|ArchiBoT|OnLog() Gateway | Disconnected

Library properly attempted to reconnect after missing last heartbeat, but it seems like it failed with timeout after those ~45 seconds and decided to stop further attempts. According to https://github.com/RogueException/Discord.Net/commit/73ac9d7886aa48b9d809c56e51945056f3b67232 - similar issue should be already solved for 2.0.0-beta, but it seems that this issue is still recent in one form or another.

It should definitely keep on trying until it succeeds, unless this is somehow intended (in this case how we should handle it ourselves?)

Thank you in advance.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 59 (24 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve just started using Discord.Net in a project and have immediately noticed this issue. I have a bot that starts up and runs LoginAsync() and then StartAsync(). It connects and works fine. However, after some number of hours, I notice the bot has gone offline on Discord, and my console log displays Gateway: Disconnected. That’s it. There are no other messages about a heartbeat, and it literally never, ever, ever attempts to reconnect. It always disconnects after a while, and it never reconnects.

When this happens, the bot stays offline on Discord, and no longer responds to PMs. However, curiously, it is still able to send automated messages to a channel via SendMessageAsync() despite being offline.

My only solution to get the bot fully online again is to restart it completely. This is quite annoying to have to do every single day. Sometimes I have to do it multiple times per day.

I just merged two fixes to possible deadlocks, #1872 and #1873 They should deal with the valid reports that I read here.

These changes are effective in 2.4.1-dev and the latest 3.0.0-dev (so 3.0.0-dev-20210615.5) that are available in MyGet. Please use these before reporting any other deadlock.

As a note, the websocket connection being closed and reconnecting isn’t a deadlock or an issue at all. If it reconnected, it’s working.

Yeah this should really be bumped up and prioritized with these stupid changes. image

I can reproduce too:

01:56:09,735 [49] INFO - 01:56:09 Gateway     Failed to resume previous session
01:56:09,735 [49] INFO - 01:56:09 Gateway     Disconnecting
01:56:09,750 [87] INFO - 01:56:09 Gateway     Disconnected

nothing happend after Disconnected. I will try to get more information about the disconnect.

Server missed last heartbeat

The last time I didn’t get this one a few times a day was 0.9x 😉

One potential idea I had to fix this would be to go the route of Wumpus.Net. Wumpus.Net drops the StartAsync/StopAsync system and instead uses a RunAsync method to run the client. This RunAsync method would run the client until an unhandled exception occurs, including reconnecting and resuming where possible, accepting a cancellation token. Then, the StopAsync method would cancel this RunAsync call, or you could pass in your own CancellationToken.

To retain backwards compatibility, StartAsync would likely be rewritten into:

async Task StartAsync()
{
    while (true)
    {
        try
        {
            await RunAsync(CancellationToken.None);
        }
        catch (OperationCanceledException)
        {
            break;
        }
        catch (Exception)
        {
            // potentially log here instead of throw
            throw;
        }
    }
}

Going this route gives us more flexibility, and also lets us drop that annoying ConnectionManager class which is frankly impossible to debug IMHO.