lnd: failing link: unable to update commitment: cannot add duplicate keystone with error: internal error

Background

The error message “failing link: unable to update commitment: cannot add duplicate keystone with error: internal error” appeared in my logs, for reasons I don’t understand. Afterwards the channel was unusable and my peer (running CLN) immediately force-closed the channel at 01:26.

The force-close transaction contains one outgoing HTLC (timeout, according to lncli closedchannels) with size 200003 sat.

My peer says:

I’m getting hit by several force closes for all kind of reasons in the last couple days. Ours is just “internal error” in my log

Relevant snippet from my logs:

01:26:09.438 [INF] PEER: NodeKey(PUBKEY) loading ChannelPoint(CHANPOINT)
01:26:09.439 [DBG] CNCT: New ChainEventSubscription(id=15) for ChannelPoint(CHANPOINT)
01:26:09.439 [INF] HSWC: ChannelLink(CHANPOINT): starting
01:26:09.439 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(CHANPOINT)
01:26:09.439 [INF] HSWC: ChannelLink(CHANPOINT): HTLC manager started, bandwidth=3663215408 mSAT
01:26:09.439 [INF] HSWC: ChannelLink(CHANPOINT): attempting to re-synchronize
01:26:09.439 [INF] PEER: Negotiated chan series queries with PUBKEY
01:26:09.519 [ERR] RPCS: [connectpeer]: error connecting to peer: already connected to peer: PUBKEY@IP2:48304
01:26:09.519 [ERR] RPCS: [/lnrpc.Lightning/ConnectPeer]: already connected to peer: PUBKEY@IP2:48304
01:26:09.839 [INF] HSWC: ChannelLink(CHANPOINT): received re-establishment message from remote side
01:26:09.851 [DBG] HSWC: ChannelLink(CHANPOINT): loaded 0 fwd pks
01:26:11.237 [DBG] HSWC: ChannelLink(CHANPOINT): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=6133225)->(Chan ID=CHAN_ID, HTLC ID=4619)
01:26:11.991 [DBG] HSWC: ChannelLink(CHANPOINT): removing Add packet (Chan ID=0:0:0, HTLC ID=6133225) from mailbox
01:26:13.366 [DBG] HSWC: ChannelLink(CHANPOINT): settle-fail-filter &{1 [0]}
01:26:13.366 [DBG] HSWC: ChannelLink(CHANPOINT): Failed to send 500059997 mSAT
01:26:15.297 [DBG] CNCT: ChannelArbitrator(CHANPOINT): attempting state step with trigger=chainTrigger from state=StateDefault
01:26:15.297 [DBG] CNCT: ChannelArbitrator(CHANPOINT): new block (height=734163) examining active HTLC's
01:26:15.297 [DBG] CNCT: ChannelArbitrator(CHANPOINT): checking commit chain actions at height=734163, in_htlc_count=0, out_htlc_count=2
01:26:15.297 [DBG] CNCT: ChannelArbitrator(CHANPOINT): no actions for chain trigger, terminating
01:26:15.297 [DBG] CNCT: ChannelArbitrator(CHANPOINT): terminating at state=StateDefault
01:26:16.485 [DBG] HSWC: ChannelLink(CHANPOINT): settle-fail-filter &{1 [0]}
01:26:16.485 [DBG] HSWC: ChannelLink(CHANPOINT): Failed to send 1000047000 mSAT
01:26:26.905 [DBG] HSWC: ChannelLink(CHANPOINT): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=6133240)->(Chan ID=CHAN_ID, HTLC ID=4620)

01:26:26.956 [ERR] HSWC: ChannelLink(CHANPOINT): failing link: unable to update commitment: cannot add duplicate keystone with error: internal error

01:26:26.956 [INF] HSWC: ChannelLink(CHANPOINT): exited
01:26:26.957 [INF] HSWC: ChannelLink(CHANPOINT): stopping

Your environment

  • lnd 0.14.3-beta-rc1
  • Linux server 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux
  • bitcoind v23

Steps to reproduce

Have non-anchor channel with CLN behind tor. Have somewhat flaky connection. Send HTLCs to peer.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 4
  • Comments: 27 (13 by maintainers)

Most upvoted comments

I run the latest CLN (0.11.1) - my node has force closed multiple channels when my peers told me they had an internal error.

According to BOLT 2 (https://github.com/lightning/bolts/blob/master/01-messaging.md#requirements-2)

The receiving node:

upon receiving error:
    if channel_id is all zero:
        MUST fail all channels with the sending node.
    otherwise:
        MUST fail the channel referred to by channel_id, if that channel is with the sending node.

So the CLN behaviour is up to specs AFAICT.

Ok, I might have an idea of what’s going on. If I’m right this is an lnd problem, not a c-l problem

I have had about 20 of these over the last couple of days. This is a really pressing issue.

Noted a related thread in #6482 where in some cases we may not be properly cancelling inbound HTLCs if we attempt to send a commitment but the remote peer never replies. This is a bit trickier since we’ve technically already sent out that valid commitment, so we need to be playing that HTLC (may lead to a force close since we want to be able to safely time out that incoming HTLC).

Looks like this was introduced inadvertently in this PR (according to @Crypt-iQ): https://github.com/lightningnetwork/lnd/pull/4183