btcd: Something in the wild started freezing my btcd

I’ve been running btcd in production for a year. After initial sync problems, I got it working and it was quite smooth sail. Most recently I’ve been running 130ea5bddde33df32b06a1cdb42a6316eb73cff5. About a week ago I noticed the node to fall out of sync and stayed stuck.

Restarting btcd and launching it again started syncing to catch up. It was behind couple hundred blocks.

The problem is that something triggers the freeze again and again I was unable to complete the syncing. I’m able to sync few blocks and the freeze happens again. I assume this is caused by some misbehaving peer reconnecting and feeding my btcd node some problematic data.

Here is a log just before the freeze happened (with redacted IPs):

btcd    | 2019-09-24 21:52:02.788 [INF] SYNC: New valid peer <IP1>:8333 (outbound) (/Satoshi:0.15.1/)
btcd    | 2019-09-24 21:52:03.940 [INF] SYNC: Lost peer <IP1>:8333 (outbound)
btcd    | 2019-09-24 21:52:04.103 [INF] SYNC: New valid peer <IP2>:8333 (outbound) (/Satoshi:0.13.1/)
btcd    | 2019-09-24 21:52:12.065 [INF] CHAN: Adding orphan block 0000000000000000000d11ede72158396bed291a784be023a51d7f812d41196d with parent 000000000000000000100abc79d25cc29e0a22800c53e90b7b9fceaeba991e1c
btcd    | 2019-09-24 21:52:16.297 [INF] SYNC: Lost peer <IP3>:59690 (inbound)
btcd    | 2019-09-24 21:52:16.298 [INF] SYNC: Lost peer <IP4>:57225 (inbound)
btcd    | 2019-09-24 21:52:16.299 [INF] SYNC: New valid peer <IP5>:60213 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-24 21:52:18.802 [INF] SYNC: New valid peer <IP6>:34082 (inbound) (/bitcoin-seeder:0.01/)
btcd    | 2019-09-24 21:52:19.681 [INF] SYNC: Lost peer <IP6>:34082 (inbound)
btcd    | 2019-09-24 21:52:21.230 [INF] SYNC: Lost peer <IP5>:60213 (inbound)
btcd    | 2019-09-24 21:52:37.963 [INF] CHAN: Adding orphan block 00000000000000000001152b24946fffdc6d1e097e2660ceb24cfaa57afafcfb with parent 0000000000000000000d11ede72158396bed291a784be023a51d7f812d41196d
btcd    | 2019-09-24 21:52:41.396 [INF] SYNC: New valid peer <IP7>:62493 (inbound) (/Satoshi:0.12.1/)
btcd    | 2019-09-24 21:52:41.526 [INF] SYNC: New valid peer <IP7>:39297 (inbound) (/Satoshi:0.14.2/UASF-Segwit:1.0(BIP148)/)
btcd    | 2019-09-24 21:52:41.608 [INF] SYNC: Lost peer <IP7>:39297 (inbound)
btcd    | 2019-09-24 21:52:45.429 [INF] CHAN: Adding orphan block 000000000000000000120b951aaae9ab5939130050b2bc08335cef1b0f6faa5b with parent 0000000000000000000355f0074b9c2bb63affaae1a09d5e9ecff7b29bb4863a
btcd    | 2019-09-24 21:52:49.322 [INF] CHAN: Adding orphan block 000000000000000000100abc79d25cc29e0a22800c53e90b7b9fceaeba991e1c with parent 000000000000000000120b951aaae9ab5939130050b2bc08335cef1b0f6faa5b
btcd    | 2019-09-24 21:52:52.169 [INF] SYNC: New valid peer <IP8>:56180 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-24 21:52:53.227 [INF] SYNC: Lost peer <IP8>:56180 (inbound)
btcd    | 2019-09-24 21:52:55.878 [INF] SYNC: New valid peer <IP9>:38574 (inbound) (/btc_node_scanner:0.0.1/)
btcd    | 2019-09-24 21:52:56.646 [INF] SYNC: Lost peer <IP9>:38574 (inbound)
btcd    | 2019-09-24 21:53:06.707 [INF] SYNC: New valid peer <IP10>:40149 (inbound) (/nodes.bitcoin-russia.ru:0.0.1f/)
btcd    | 2019-09-24 21:53:07.073 [INF] SYNC: Lost peer <IP11>:50862 (inbound)
btcd    | 2019-09-24 21:53:09.396 [INF] SYNC: Rejected block 00000000000000000001152b24946fffdc6d1e097e2660ceb24cfaa57afafcfb from <IP12>:8333 (outbound): already have block (orphan) 00000000000000000001152b24946fffdc6d1e097e2660ceb24cfaa57afafcfb
btcd    | 2019-09-24 21:53:10.762 [INF] SYNC: Lost peer <IP13>:40149 (inbound)
btcd    | 2019-09-24 21:53:11.312 [INF] SYNC: Lost peer <IP7>:62493 (inbound)
btcd    | 2019-09-24 21:53:14.434 [INF] SYNC: Lost peer <IP14>:60589 (inbound)

A few observations:

  1. btcctl from localhost doesn’t respond, it stays stuck as well
  2. log mentions orphan blocks
  3. IP7 is interesting, it looks like it connects to my node from two separate nodes sitting behind the same IP. I also saw this IP earlier in the logs connecting with nodes identifying as /Bitcoin ABC:0.14.6(EB8.0)/ or /bitcoinj:0.14.4/Bitcoin Wallet:5.25/.
  4. when I send signal to kill btcd, the logging resumes with shutdown messages (see below), so logging subsystem is not stuck, it just looks that main btcd “thread” is blocked
btcd    | 2019-09-24 21:57:08.310 [INF] BTCD: Received signal (terminated).  Shutting down...
btcd    | 2019-09-24 21:57:08.323 [INF] BTCD: Gracefully shutting down the server...
...

My workaround was to put the btcd node behind a specific bitcoind node. This let it sync and work fine (so far).

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 44 (20 by maintainers)

Most upvoted comments

Nice, I think that should do it. I opened it as https://github.com/btcsuite/btcd/pull/1480. Feel free to report here or there if you run into any issues.

This confirms my suspicion seemingly! 🤠 Itadakimasu! Thank you, @darwin and @wpaulino.

Thank you for all the helpful information, @darwin!

Would you like to help me test a potential temporary resolution?