lnd: [bug]: cannot connect to peer - "failing link: unable to handle upstream settle with error: invalid update", "unknown channel ID"
Background
I had a channel, but could not connect to the peer, always getting disconnected with logs like this:
2023-10-29 07:05:42.433 [INF] PEER: Peer(031c8b9e52039af6892015702bab9b7e13b46979a519497ad8b6674bfe88ea7c27): Loading ChannelPo
int(e16ae8e1bab76288c800e02d31d8e6cceab3da6145e62bcfa2291fd0b8b57d05:1), isPending=false
2023-10-29 07:05:42.434 [INF] HSWC: ChannelLink(e16ae8e1bab76288c800e02d31d8e6cceab3da6145e62bcfa2291fd0b8b57d05:1): starting
2023-10-29 07:05:42.434 [INF] HSWC: Trimming open circuits for chan_id=812226:2620:1, start_htlc_id=18712
2023-10-29 07:05:42.434 [INF] HSWC: Adding live link chan_id=057db5b8d01f29a2cf2be64561dab3eacce6d8312de000c88862b7bae1e86ae0,
short_chan_id=812226:2620:1
2023-10-29 07:05:42.434 [INF] HSWC: ChannelLink(e16ae8e1bab76288c800e02d31d8e6cceab3da6145e62bcfa2291fd0b8b57d05:1): HTLC manag
er started, bandwidth=2316950 mSAT
2023-10-29 07:05:42.434 [INF] HSWC: ChannelLink(e16ae8e1bab76288c800e02d31d8e6cceab3da6145e62bcfa2291fd0b8b57d05:1): attempting
to re-synchronize
2023-10-29 07:05:42.434 [INF] PEER: Peer(031c8b9e52039af6892015702bab9b7e13b46979a519497ad8b6674bfe88ea7c27): Negotiated chan s
eries queries
2023-10-29 07:05:42.434 [INF] DISC: Creating new GossipSyncer for peer=031c8b9e52039af6892015702bab9b7e13b46979a519497ad8b6674b
fe88ea7c27
2023-10-29 07:05:42.434 [INF] NTFN: New block epoch subscription
2023-10-29 07:05:42.434 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(e16ae8e1bab76288c800e02d31d8e6cceab3d
a6145e62bcfa2291fd0b8b57d05:1)
2023-10-29 07:05:42.705 [INF] HSWC: ChannelLink(e16ae8e1bab76288c800e02d31d8e6cceab3da6145e62bcfa2291fd0b8b57d05:1): received r
e-establishment message from remote side
2023-10-29 07:05:42.713 [ERR] HSWC: ChannelLink(e16ae8e1bab76288c800e02d31d8e6cceab3da6145e62bcfa2291fd0b8b57d05:1): failing li
nk: unable to handle upstream settle with error: invalid update
2023-10-29 07:05:42.713 [ERR] HSWC: ChannelLink(e16ae8e1bab76288c800e02d31d8e6cceab3da6145e62bcfa2291fd0b8b57d05:1): link faile
d, exiting htlcManager
2023-10-29 07:05:42.713 [INF] HSWC: ChannelLink(e16ae8e1bab76288c800e02d31d8e6cceab3da6145e62bcfa2291fd0b8b57d05:1): exited
2023-10-29 07:05:42.713 [INF] HSWC: ChannelLink(e16ae8e1bab76288c800e02d31d8e6cceab3da6145e62bcfa2291fd0b8b57d05:1): stopping
2023-10-29 07:05:42.713 [INF] HSWC: Removing channel link with ChannelID(057db5b8d01f29a2cf2be64561dab3eacce6d8312de000c88862b7bae1e86ae0)
2023-10-29 07:05:42.726 [ERR] PEER: Peer(031c8b9e52039af6892015702bab9b7e13b46979a519497ad8b6674bfe88ea7c27): Unknown channel ID: 057db5b8d01f29a2cf2be64561dab3eacce6d8312de000c88862b7bae1e86ae0 found in received msg=UpdateAddHTLC
2023-10-29 07:05:42.726 [ERR] PEER: Peer(031c8b9e52039af6892015702bab9b7e13b46979a519497ad8b6674bfe88ea7c27): Unknown channel ID: 057db5b8d01f29a2cf2be64561dab3eacce6d8312de000c88862b7bae1e86ae0 found in received msg=CommitSig
2023-10-29 07:05:42.726 [ERR] PEER: Peer(031c8b9e52039af6892015702bab9b7e13b46979a519497ad8b6674bfe88ea7c27): Unknown channel ID: 057db5b8d01f29a2cf2be64561dab3eacce6d8312de000c88862b7bae1e86ae0 found in received msg=RevokeAndAck
The channel has been force-closed a day later, to claim the incoming HTLC I guess. The channel was working and forwarding at least until 28/Oct/2023 22:54 (the last recorded forward according to my node). I have / had 3-4 other channels with the same problem, at least in one case the complaints about unknown channel ID stopped, without restarting lnd and without closing that channel.
At the very least, this problem makes a channel unusable. And sounds a bit scary, did one of the nodes lose some data? How can a node suddenly stop recognizing its channel?
Your environment
- version of
lnd
: 0.17 (both peers). I use boltdb, peer probably postgres (on raspiblitz). - which operating system: Debian 12.1
- version of
bitcoind
: v25.1rc1
Steps to reproduce
No idea.
Expected behaviour
Should be able to connect.
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Comments: 15
Fixed with #8220 (lnd 17.3)
So encountered the exact same problem with another node and I could narrow down the problem. So in case you still have some channels which do not reactive because of a
invalid update
you could follow the steps laid down below to confirm that the situation is the same for you.@Roasbeef @yyforyongyu I think you should take a look at this.
What happened to the other noderunner:
He had a channel with 10 htlcs with the following relevant forwarded HTLC:
Now his peer also an LND node tried to settle this exact HTLC with an
UpdateFulfillHTLC
msg. So far so good now comes the problem:He tries to settle the HTLC with the ID:
Received UpdateFulfillHTLC(chan_id=cb6f9d6b9e525f4f393ad0829ffa2696cc244d8b7fd55d3d6bd2d6c1ff31ad3e, id=103214, pre_image=5183dbc0d4da7160aeeb43c19d20910139e18b8468f64c6feb69d1beac67f812) from ....
But somehow his node thinks this HTLC is not locked in:
2023-11-22 10:13:06.966 [ERR] HSWC: ChannelLink(3fad31ffc1d6d26b3d5dd57f8b4d24cc9626fa9f82d03a394f5f529e6b9d6fcb:1): failing link: unable to handle upstream settle with error: invalid update
=> relevant codeline: https://github.com/lightningnetwork/lnd/blob/f005b248ced9bcd2316c0cbe7c9c25298796513c/htlcswitch/link.go#L1755
What I think happens is that somehow the hashes of the 2 onionblobs remote and local differ and therefore we do not count this HTLC as an active HTLC which is fully locked in.
=> relevant codeline: https://github.com/lightningnetwork/lnd/blob/master/channeldb/channel.go#L2094-L2120
Looking shortly at this code, I am not sure whether we need this kind of strict check, do we really need to make sure both onion-blobs are the same in the Settle-Case, I mean we can just try whether the preimage is good and if it is we will never need this Onion-Blob anyways… So I think this check can be loosened up, remains the question how could the two onion-blobs diverge, maybe we need to check in detail whether we do not flush different things in some cases.
Apart from that the related Channel got Force-Closed and the relevant HTLC got swept by the preimage by its peer which is evidence that the peer tried to settle the correct HTLC and the problem lies indeed locally in having a database inconsistency between the remote and the local onionblob.
Sweep of the HTLC by the preimage:
https://mempool.space/tx/9a98fcd342d575dbbd225a8d921b605664eed3b90c6306a4723b56521293f02d