lnd: [bug]: Channel is stuck in pending state.

Reported by several node runners, Channels which have all their contracts resolved onchain, still do have the channel stuck in the pending state.

The problem is based on the behaviour of the sweeper especially when a sweep is not resolved (pending in the mempool) and lnd restarts.

Example: With force-close leading to an Output (alias Sweeper-Inputs) which is registered with the sweeper engine. So far so good, lets say the fee is not sufficiient enough and the node-runner restarts the node after a while. Lnd does not remember the old sweeps. Now Lnd is relaunching the contract resolvers including the Output from the Force-Close. Very likely that lnd tries to publish the Sweep-Input again but its rejected by the mempool because we already have the Output swept before. The problem is now that LND will try to sweep this Input until MaxSweepAttempts is reached, removing the notification notifier for this input. Now lets imagine the old sweep tx is confirmed. Lnd will not be able to register this spent and the channel will stay in the Pending state forever (user can still abolish the channel but thats another topic).

Problem lies here: https://github.com/lightningnetwork/lnd/blob/master/sweep/sweeper.go#L1373-L1382 => we basically remove the channel which notifies for a spent because we remove the input alltogether.

So we could fix this quick and dirty but I think this should be taken care of in the process of refactoring the sweeper wdyt @yyforyongyu. I think the best strategy is for bitcoind backends to check the input whether its already spent when registering the input with the sweeper. (gettxout rpc call from bitcoind - including spent in mempool). For other backends I still have to think about a solution.

I am wondering why the rescan when registering the input with the sweeper did not signal “already spent” when calling waitforspend maybe the look ahead is too short ?

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Reactions: 2
  • Comments: 23

Most upvoted comments

Okay, closing this issue then. @ziggie1984 would you mind creating the issue for the height hint cache? It sounds to me like you might have the most up-to-date context on that. I did a quick scan over our existing issues and there doesn’t seem to be one for it yet.

I am not sure whether we can close this issue tho? Because sanjay main problem was somehow that the height hint cache was updated although funds where not recovered. I looked into the sweeper code and whether it does somehow poison the height_hint cache but could not reproduce this behaviour. But I have the feeling that this setting --height-hint-cache-query-disable was introduced because of such a behaviour in the past.

Maybe close this issue and open a more narrowed one which describes the problem with the height hint cache ?

Yes let’s wait until docs are updated.

With the bitcoin-core dev help of @maflcko we were able to identify the issue.

When having the bitcoind config option set: rpcserialversion=0, the rpc command will only provide the non-segwit serialization of the transaction, which leads to the above behaviour (no witness data).

According to maflcko this setting will be deprecated in the next release (26) but we will definitely need to highlight it in the docs or even check for this if that’s possible.

Not sure if this is related or not, but I noticed this from the log in https://github.com/lightningnetwork/lnd/discussions/8007:

Using height hint 809349 retrieved from cache for outpoint=7e26f401a44a5398529032ac50e9a6182a5719fdc0f5630451c2aa7c6c49d585:0, script=0 6eec80ba4a7d32058575c31c24a29946ca03a46c4829be5801c6f5a6b88ae7bd instead of 724383 for spend subscription

This is a channel that on-chain has 50k confirmations for the force-close TX but lnd still sees it as pending. Could it be that our own sweep TX “poisons” our height hint cache and causes us to miss spend notifications?

I see that we try to re-publish the force close TX which gets refused:

2023-09-25 16:47:31.217 [INF] CNCT: Re-publishing force close tx(286d1a06df7ed99dae48636f42ad9bee7d1fcd15e0e31db09517bf68b634e2d1) for channel 7e26f401a44a5398529032ac50e9a6182a5719fdc0f5630451c2aa7c6c49d585:0
...
2023-09-25 16:47:31.232 [INF] LNWL: Inserting unconfirmed transaction 286d1a06df7ed99dae48636f42ad9bee7d1fcd15e0e31db09517bf68b634e2d1
2023-09-25 16:47:31.304 [INF] LNWL: 286d1a06df7ed99dae48636f42ad9bee7d1fcd15e0e31db09517bf68b634e2d1: tx already confirmed
...
2023-09-25 16:47:31.435 [INF] NTFN: New confirmation subscription: conf_id=2, txid=286d1a06df7ed99dae48636f42ad9bee7d1fcd15e0e31db09517bf68b634e2d1, num_confs=6 height_hint=809351
..
2023-09-25 16:47:31.449 [DBG] NTFN: Dispatching historical confirmation rescan for txid=286d1a06df7ed99dae48636f42ad9bee7d1fcd15e0e31db09517bf68b634e2d1

#7811 catches this case and gracefully shuts down lnd as it’s been reported before occasionally bitcoind won’t return the witness.

I think we’ve seen this before with a node but couldn’t find the issue anymore. I think if the update doesn’t work there might be a need to reindex the chain. Not sure if -reindex or -reindex-chainstate is enough, worst case a full reset and starting from an empty data directory for bitcoind might be needed.

@guggero I think a quick fix for @sanjay-shah would be if we add a command in chantools to drop the spend and conf height hint caches, so that a restart would trigger the resolving of all channels ? Not a long term fix but still something which is done quickly ?