lnd: Channel ERROR: "failing link: unable to resolve fwd pkgs: bucket not found with error: internal error"

Background

I run a CLN node and have experienced quite a couple of instances where my node force-closed a channel, due to the LND peer sending an internal error message.

I finally had this error with a peer that was able to provide the relevant logs (@ZoltanAB)

LND environment

LND: 0.14.2-beta
OS: Linux ipayblue-1 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux
Using @C-Otto’s rebalance-lnd script
(if that’s relevant)

Steps to reproduce

Have a channel between LND / CLN that forwards HTLCs.

Expected behaviour

LND should not send an error.

Actual behaviour

LND sends an error.

Logs

LND Logs (peer A)

2022-05-29 21:46:19.294 [ERR] HSWC: ChannelLink(297f43e0ac9a7307f334dc2a38eac05a86943f77e912dba679bc9cda52284a55:0): unable to remove fwd pkg for height=421027: bucket not found
2022-05-29 21:46:19.294 [ERR] HSWC: ChannelLink(297f43e0ac9a7307f334dc2a38eac05a86943f77e912dba679bc9cda52284a55:0): failing link: unable to resolve fwd pkgs: bucket not found with error: internal error

CLN logs (peer B)

2022-05-29T21:46:12.082Z UNUSUAL 032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-channeld-chan#7100: Adding HTLC 2358 too slow: killing connection
2022-05-29T21:46:12.084Z INFO    032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-chan#7100: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (9)
2022-05-29T21:46:20.650Z UNUSUAL 032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-chan#7100: Peer permanent failure in CHANNELD_NORMAL: channeld: received ERROR error channel 554a2852da9cbc79a6db12e9773f94865ac0ea382adc34f307739aace0437f29: internal error
2022-05-29T21:46:20.651Z INFO    032fe854a231aeb2357523ee6ca263ae04ce53eee8a13767ecbb911b69fefd8ace-chan#7100: State changed from CHANNELD_NORMAL to AWAITING_UNILATERAL

Additional info

The LND node was heavily rebalancing and thus running into memory issues about 7 minutes before the event (no log entries up to 2022-05-29 21:40:02.124).

index

As you can tell from the graph, they stopped their rebalancing script a couple of hours after the crash.

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 2
Comments: 22 (1 by maintainers)

Most upvoted comments

Thanks for the logs, I know why this happens. I’ll start working on a fix

Crypt-iQ on Jun 8, 2022

This is not the same issue

Crypt-iQ on May 30, 2022

Glad I could contribute a little.

On Wed, Jun 8, 2022, 19:53 Eugene @.***> wrote:

Thanks for the logs, I know why this happens. I’ll start working on a fix

— Reply to this email directly, view it on GitHub https://github.com/lightningnetwork/lnd/issues/6593#issuecomment-1150160720, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASBNXJV55776I4YQHTEOBO3VODFY7ANCNFSM5XJWSZIQ . You are receiving this because you were mentioned.Message ID: @.***>

ZoltanAB on Jun 8, 2022

@ZoltanAB do you have more logs for this channel for several minutes before and after the above error? When did the node OOM? Relevant log categories would be HSWC, PEER, LNWL, CHDB.

Does the log file contain any sensitive information? If not, I could send you the log file around that date and hour. Please advise. Thank you.

It contains privacy-leaking information (channel points, etc) - which I don’t need if you want to redact them out. I am eugene on the lnd slack

Crypt-iQ on Jun 7, 2022

The rebalancing scripts were working fine. I guess the main issue was trying to run too many instances in the same time, this killed my system (out of memory).

ZoltanAB on May 30, 2022