lnd: Health check: chain backend failed after 3 calls. How to avoid the shutdown?
Background
My LND node keeps shutting down. Seems like it’s triggered by a short network failure preventing the node from reaching bitcoind
Your environment
- lnd version 0.11.99-beta commit=v0.11.0-beta-255-g10a84f2c75c9de958c9dd07de830f0546a43d05a
 - which operating system: Debian GNU/Linux 10 4.19.0-8-amd64
 - version of 
bitcoind: Bitcoin Core version v0.20.1 lndandbitocoindare on different servers
Steps to reproduce
Restart LND. Repros every 1 to 2 days.
Expected behaviour
Just keep running.
Actual behaviour
2020-10-03 02:18:26.258 [INF] NTFN: New block: height=651006, sha=000000000000000000059e4ba6945311b11cf30cea4389ca932ebd0c89f5186c
2020-10-03 02:18:26.258 [INF] UTXN: Attempting to graduate height=651006: num_kids=0, num_babies=0
2020-10-03 02:18:58.221 [INF] CRTR: Processed channels=0 updates=554 nodes=1 in last 59.999815896s
2020-10-03 02:19:24.910 [CRT] SRVR: Health check: chain backend failed after 3 calls  <------------
2020-10-03 02:19:24.910 [INF] SRVR: Sending request for shutdown
Question
How to avoid the shutdown?
About this issue
- Original URL
 - State: closed
 - Created 4 years ago
 - Comments: 30 (12 by maintainers)
 
There seems to be a problem with the health check, some false positives we are investigating. For now, you can disable the health check with
--healthcheck.chainbackend.attempts=0.Thanks for the info @C-Otto! We suspected something like this would be the case, but didn’t actually go look in core itself. I’m pretty surprised that it blocks for minutes though! I’ll switchover to a call that doesn’t need the lock (just need one which we have for neutrino/btcd as well, but not critical that it’s the same endpoint imo). Will get this in for 0.12.
I forgot
make install, and just verified that “uptime” is used with the current version. Thanks for the quick help 😃Sure, but being unreachable for 2 minutes? The code here is also pretty simple: send the request, then wait for it to come back. I don’t think the healthchecks themselves are causing high I/O as it’s just an RPC call, and
lndmakes several of these tobitcoindon a normal basis for routing operation.@alevchuk no reason not to bump the default. It does seem bizarre to me that the call is taking 10 seconds to complete, so we’re still investigating to make sure nothing is wrong on our side. If we can’t figure anything out, will likely bump the default and see how that goes.
Thanks for the report @sendbitcoin! I’d strongly recommend running lnd with
debuglevel=HLCK=debugso that we can see the reason your check is failing. The logging was only bumped to info level in35a2dbcso you may not see the reason otherwise.I’m getting this issue. lnd has shutdown twice already because of this. backend is bitcoin core 0.18.1 on the same computer.
2020-10-08 08:09:51.192 [CRT] SRVR: Health check: chain backend failed after 3 calls 2020-10-08 08:09:51.192 [INF] SRVR: Sending request for shutdown 2020-10-08 08:09:51.193 [INF] LTND: Received shutdown request. 2020-10-08 08:09:51.193 [INF] LTND: Shutting down… 2020-10-08 08:09:51.193 [INF] LTND: Gracefully shutting down.
I increased the healthcheck.chainbackend.attempts to 10 and healthcheck.chainbackend.timeout to 30s and will keep an eye on it to see if it happens again