lnd: [bug]: channel inactive but online

Background

One of the channels I opened long time ago has been showing as inactive for a few weeks now. I can’t find the node in lncli peers but it seems to be alive and well (it’s blockstream store). I get disconnected from it every time I try connecting either automatically or manually.

Your environment

  • lncli version 0.16.99-beta commit=v0.16.0-beta-368-g753af11ed (I’ve had this issue with earlier commits too - as early as a few months ago)
  • Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-73-generic x86_64)
  • Bitcoin Core RPC client version v25.0.0

Steps to reproduce

$ lncli connect 02df5ffe895c778e10f7742a6c5b8a0cefbe9465df58b92fadeb883752c8107c8f@giexynrrloc2fewstcybenljdksidtglfydecbellzkl63din6w73eid.onion:9735 |  tail -f ~/.lnd/logs/bitcoin/mainnet/lnd.log

Finalizing connection to 02df5ffe895c778e10f7742a6c5b8a0cefbe9465df58b92fadeb883752c8107c8f@127.0.0.1:49756, inbound=true
2023-06-11 14:16:10.233 [INF] PEER: Peer(02df5ffe895c778e10f7742a6c5b8a0cefbe9465df58b92fadeb883752c8107c8f): loading ChannelPoint(3e37348c79282a149f48eae6c51548b01e0064f5e7d8903fe7f4ca911855c278:1)
2023-06-11 14:16:10.233 [INF] HSWC: ChannelLink(3e37348c79282a149f48eae6c51548b01e0064f5e7d8903fe7f4ca911855c278:1): starting
2023-06-11 14:16:10.233 [INF] HSWC: Trimming open circuits for chan_id=689862:1669:1, start_htlc_id=224
2023-06-11 14:16:10.233 [INF] HSWC: Adding live link chan_id=78c2551891caf4e73f90d8e7f564001eb04815c5e6ea489f142a28798c34373f, short_chan_id=689862:1669:1
2023-06-11 14:16:10.234 [INF] NTFN: New block epoch subscription
2023-06-11 14:16:10.233 [INF] HSWC: ChannelLink(3e37348c79282a149f48eae6c51548b01e0064f5e7d8903fe7f4ca911855c278:1): HTLC manager started, bandwidth=7819940195 mSAT
2023-06-11 14:16:10.234 [INF] HSWC: ChannelLink(3e37348c79282a149f48eae6c51548b01e0064f5e7d8903fe7f4ca911855c278:1): attempting to re-synchronize
2023-06-11 14:16:10.234 [INF] PEER: Peer(02df5ffe895c778e10f7742a6c5b8a0cefbe9465df58b92fadeb883752c8107c8f): Negotiated chan series queries
2023-06-11 14:16:10.233 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(3e37348c79282a149f48eae6c51548b01e0064f5e7d8903fe7f4ca911855c278:1)
2023-06-11 14:16:11.351 [INF] HSWC: ChannelLink(3e37348c79282a149f48eae6c51548b01e0064f5e7d8903fe7f4ca911855c278:1): received re-establishment message from remote side
2023-06-11 14:16:11.359 [INF] HSWC: ChannelLink(3e37348c79282a149f48eae6c51548b01e0064f5e7d8903fe7f4ca911855c278:1): sending 2 updates to synchronize the state
2023-06-11 14:16:12.213 [INF] PEER: Peer(02df5ffe895c778e10f7742a6c5b8a0cefbe9465df58b92fadeb883752c8107c8f): unable to read message from peer: EOF
2023-06-11 14:16:12.213 [INF] PEER: Peer(02df5ffe895c778e10f7742a6c5b8a0cefbe9465df58b92fadeb883752c8107c8f): disconnecting 02df5ffe895c778e10f7742a6c5b8a0cefbe9465df58b92fadeb883752c8107c8f@127.0.0.1:49756, reason: read handler closed
2023-06-11 14:16:12.214 [INF] NTFN: Cancelling epoch notification, epoch_id=1099
2023-06-11 14:16:12.414 [INF] HSWC: ChannelLink(3e37348c79282a149f48eae6c51548b01e0064f5e7d8903fe7f4ca911855c278:1): stopping
2023-06-11 14:16:12.414 [INF] HSWC: ChannelLink(3e37348c79282a149f48eae6c51548b01e0064f5e7d8903fe7f4ca911855c278:1): exited
2023-06-11 14:16:12.414 [INF] HSWC: Removing channel link with ChannelID(78c2551891caf4e73f90d8e7f564001eb04815c5e6ea489f142a28798c34373f)
$ lncli listchannels --inactive_only

{
    "channels": [
        {
            "active": false,
            "remote_pubkey": "02df5ffe895c778e10f7742a6c5b8a0cefbe9465df58b92fadeb883752c8107c8f",
            "channel_point": "3e37348c79282a149f48eae6c51548b01e0064f5e7d8903fe7f4ca911855c278:1",
            "chan_id": "758511290670186497",
            "capacity": "8000000",
            "local_balance": "7890518",
            "remote_balance": "99833",
            "commit_fee": "9649",
            "commit_weight": "724",
            "fee_per_kw": "13326",
            "unsettled_balance": "0",
            "total_satoshis_sent": "1909115",
            "total_satoshis_received": "1809281",
            "num_updates": "6390",
            "pending_htlcs": [
            ],
            "csv_delay": 144,
            "private": false,
            "initiator": true,
            "chan_status_flags": "ChanStatusDefault",
            "local_chan_reserve_sat": "80000",
            "remote_chan_reserve_sat": "80000",
            "static_remote_key": true,
            "commitment_type": "STATIC_REMOTE_KEY",
            "lifetime": "325301",
            "uptime": "0",
            "close_address": "",
            "push_amount_sat": "0",
            "thaw_height": 0,
            "local_constraints": {
                "csv_delay": 144,
                "chan_reserve_sat": "80000",
                "dust_limit_sat": "573",
                "max_pending_amt_msat": "18446744073709551615",
                "min_htlc_msat": "0",
                "max_accepted_htlcs": 30
            },
            "remote_constraints": {
                "csv_delay": 961,
                "chan_reserve_sat": "80000",
                "dust_limit_sat": "546",
                "max_pending_amt_msat": "7920000000",
                "min_htlc_msat": "1",
                "max_accepted_htlcs": 483
            },
            "alias_scids": [
            ],
            "zero_conf": false,
            "zero_conf_confirmed_scid": "0"
        }
    ]
}

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 1
  • Comments: 46

Most upvoted comments

Yes, that’s very nice of them. But I for one will never ever use this, and will opt for ditching my lnd (or otherwise broken) peer and not risking a force close tx with that peer to get stuck. The only use case this has for me, is to allow the channel to come online and then immediately negotiate a cooperative close with the faulty peer. But even that is risky, since if something happens along that path one is a bit screwed.

There are mostly legit reasons why could peer temporarily lose mempool rather then being somehow “broken”. For example any kind of hw migration, ibd or up until recently (https://github.com/bitcoin/bitcoin/issues/27722) potentially every bitcoin core reboot could lead to loss of mempool data, causing this issue. Within few hours after boot w/o mempool data bitcoin node will develop reasonably accurate tip of mempool nomatter of set mempool limit (accuracy of the tip is much more important then size for fee estimation). So unless your peer is shortly after such boot, you can unstuck the channel with cln, give it few minutes and then change setchannel back and done.

A little update, since the fees have dropped to 1 sat/vB and I didn’t force close any channels that were disabled due to this bug, they came back to life during the last few days. Hopefully #7805 is finalized and merged soon.

Not sure what happened but this issue has been resolved for me. Perhaps Blockstream Store did something on their end.

Blockstream Store applied a one time update_fee exception to allow the remote peer’s erroneous state to be accepted just once. If their mempool is subsequently fixed then the channel will continue to work.

We found all but one of the stuck channels recovered after this one time exception. The one broken peer actually has a broken mempool right now. All of the stuck peers appear to be LND nodes.

Blockstream Store applied a one time update_fee exception to allow the remote peer’s erroneous state to be accepted just once. If their mempool is subsequently fixed then the channel will continue to work. We found all but one of the stuck channels recovered after this one time exception. The one broken peer actually has a broken mempool right now. All of the stuck peers appear to be LND nodes.

one time exception, how ? by ignore-fee-limits=true ?

https://fedorapeople.org/~wtogami/a/2023/0001-DO-NOT-COMMIT-Temporary-workaround-for-LND-update_fe.patch CLN v23.05.2 users can temporarily run with this patch.

Ok I think I know what the problem might be, could you confirm that the channel which is affected by this was almost drained ?

My general impression was that this behaviour is most likely caused by some bad backend fee-estimation. I concluded this because the example I investigated in #7666 was caused by a mempool wipe-out.

But looking at the code again it seems to me, that we can also end up in this situation when our channel is drained. Could you also take a look at this @ellemouton.

Looking in particular at this code part:

https://github.com/lightningnetwork/lnd/blob/bd3f570107244688583f450653e906943f69a2f4/lnwallet/channel.go#L7339-L7355

It seems to me, that because we have a default FeeAllocation of 0.5 in combination we a drained channel (low local cap) that we end up going all the way down to the min_relay_floor (253 sat/kw). And because most of the nodes running big mempools nowdays the min_relay fee is 1 sat/vbyte.

So I think we need to be careful with the FeeAllocation, because the old fee will basically decrease over time especially when the channel is drained locally.

I can confirm that by the moment I had to force close my problematic channel, I clearly remember that it was drained. Also, the node was coinos which is CLN.