lnd: Can't clean shutdown lnd with --watchtower.active

Background

I run lnd with --watchtower.active, but I can’t clean shutdown lnd.

Your environment

  • lnd v0.11.0-beta.rc1
  • Ubuntu 18.04.2 LTS
  • btcd 0.20.1-beta

Steps to reproduce

lnd --watchtower.active and then shut down the lnd

Expected behaviour

Normal exit and the output is LTND: Shutdown complete

Actual behaviour

It is hanging and can’t be shut down, the output is

2020-09-02 11:01:00.714 [INF] LTND: Received interrupt
2020-09-02 11:01:00.715 [INF] LTND: Shutting down...
2020-09-02 11:01:00.715 [INF] LTND: Gracefully shutting down.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (5 by maintainers)

Most upvoted comments

I was finally able to reproduce it! The problem occurs if there is no active session and the number of updates is exhausted:

2020-10-27 11:18:49.409 [DBG] WTCL: Processing backup(940a681cf49e2ee12015b0d6b83f810fc5a509d5fe5bf921a74d79c9af3ac7cd, 1907)
2020-10-27 11:18:49.409 [DBG] WTCL: SessionQueue(03cce423735484aedd57882692b132e0d3c84892090936d6b44c41be0771cc1582) deciding to accept backup(940a681cf49e2ee12015b0d6b83f810fc5a509d5fe5bf921a74d79c9af3ac7cd, 1907) seqnum=4 pending=15 max-updates=20
2020-10-27 11:18:49.409 [INF] WTCL: Queued backup(940a681cf49e2ee12015b0d6b83f810fc5a509d5fe5bf921a74d79c9af3ac7cd, 1907) successfully for session 03cce423735484aedd57882692b132e0d3c84892090936d6b44c41be0771cc1582
2020-10-27 11:18:49.409 [DBG] WTCL: Session 03cce423735484aedd57882692b132e0d3c84892090936d6b44c41be0771cc1582 exhausted
2020-10-27 11:18:49.409 [INF] WTCL: Requesting new session.
2020-10-27 11:18:49.409 [DBG] WTCL: Dispatching session negotiation
2020-10-27 11:18:49.410 [DBG] WTCL: Attempting session negotiation with tower=020ecd2e0eb26d73beeec130f2a9d89bdb99a4fb9d2cd1177130b4162b2562f434
2020-10-27 11:18:49.413 [DBG] WTCL: Request for session negotiation with tower=020ecd2e0eb26d73beeec130f2a9d89bdb99a4fb9d2cd1177130b4162b2562f434@127.0.0.1:9911 failed, trying again -- reason: dial tcp 127.0.0.1:9911: connect: connection refused
2020-10-27 11:18:49.413 [DBG] WTCL: Request for session negotiation with tower=020ecd2e0eb26d73beeec130f2a9d89bdb99a4fb9d2cd1177130b4162b2562f434@7ykwxs6ln3o4xjuwcovvpl4mnyg2gbij44i3ve6kofbwbobi2tdx7wyd.onion:9911 failed, trying again -- reason: dial tcp: address 7ykwxs6ln3o4xjuwcovvpl4mnyg2gbij44i3ve6kofbwbobi2tdx7wyd.onion: no suitable address found
2020-10-27 11:18:49.413 [DBG] WTCL: Session negotiation with tower=020ecd2e0eb26d73beeec130f2a9d89bdb99a4fb9d2cd1177130b4162b2562f434 failed, trying again -- reason: session negotiation unsuccessful
2020-10-27 11:18:49.413 [DBG] WTCL: Unable to get new tower candidate, retrying after 10s -- reason: exhausted all tower candidates

If that happens, the sessionQueue is set to nil here: https://github.com/lightningnetwork/lnd/blob/master/watchtower/wtclient/client.go#L848 That causes the main loop to not read from the pipeline anymore, this isn’t reached: https://github.com/lightningnetwork/lnd/blob/master/watchtower/wtclient/client.go#L772

Therefore on shutdown, the task pipeline’s signalUntilShutdown() has no effect as nobody’s listening on https://github.com/lightningnetwork/lnd/blob/master/watchtower/wtclient/task_pipeline.go#L164.

I’m really not sure what to do in this situation, this code base is very unfamiliar to me. @cfromknecht could you take a look please?

@jogc you can solve this problem manually by removing all faulty sessions the way @wpaulino mentioned in the previous comment. The shutdown hang is caused by too many non-submitted backups. Once that queue is cleared, the problem should be gone. But we obviously still want to fix the issue itself.

The goroutine profile shows that lnd is still in the process of starting up, specifically trying to connect to btcd for fee estimation. There’s no trace of a shutdown request at all.