bitcoin: [wallet] Node process hangs after SIGINT

I noticed it while testing current master (f0d6487e290761a4fb03798240a351b5fddfdb38). The node is in IBD: 2020-04-03T15:23:06Z Synchronizing blockheaders, height: 69998 (~12.07%) And I’m using Ubuntu 16.04. I will be happy to let someone willing to see it into the VM I’m using and provide instructions to reproduce.

Here’s how I compiled it: ./configure --with-incompatible-bdb PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --disable-shared --with-pic --enable-benchmark=no --with-bignum=no --enable-module-recovery --disable-jni --no-create --no-recursion

Steps to reproduce (it happens every time I repeat this sequence):

  1. src/bitcoind
  2. wait til see first message New outbound peer connected: version
  3. click Ctrl+ C (afaik this is SIGINT)
  4. The process stuck with these logs at the end:
2020-04-03T15:23:12Z Synchronizing blockheaders, height: 73998 (~12.73%)
^C2020-04-03T15:23:13Z P2P peers available. Skipped DNS seeding.
2020-04-03T15:23:13Z dnsseed thread exit
2020-04-03T15:23:13Z tor: Thread interrupt
2020-04-03T15:23:13Z Shutdown: In progress...
2020-04-03T15:23:13Z torcontrol thread exit
2020-04-03T15:23:13Z addcon thread exit
2020-04-03T15:23:13Z net thread exit
2020-04-03T15:23:13Z msghand thread exit
2020-04-03T15:23:15Z opencon thread exit
2020-04-03T15:23:15Z scheduler thread exit
2020-04-03T15:23:15Z Dumped mempool: 4e-06s to copy, 0.002802s to dump
2020-04-03T15:23:15Z FlushStateToDisk: write coins cache to disk (0 coins, 0kB) started
2020-04-03T15:23:15Z FlushStateToDisk: write coins cache to disk (0 coins, 0kB) completed (0.00s)
2020-04-03T15:23:15Z FlushStateToDisk: write coins cache to disk (0 coins, 0kB) started
2020-04-03T15:23:15Z FlushStateToDisk: write coins cache to disk (0 coins, 0kB) completed (0.00s)
  1. Wait for several minutes, nothing happens.
  2. Try kill $PID, doesn’t do anything
  3. Finally, do kill -9 $PID to get rid of the process

So, the process doesn’t take much CPU (0.4%), and gdb says it’s 2 threads:

  Id   Target Id                                     Frame
* 1    Thread 0x7fd51d48a740 (LWP 31824) "b-shutoff" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  2    Thread 0x7fd507fff700 (LWP 31834) "bitcoind"  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

Backtrace is here.

Thanks to @vasild for letting me know how to call all that gdb stuff 😃

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 23 (23 by maintainers)

Commits related to this issue

Most upvoted comments

#18524 (now merged) should fix this issue. The PR description says #18524 is a refactor, but it’s only a refactor for new versions of boost (>=1.59). With an old version of boost it changes behavior and fixes this bug.

I just noticed it starts with a lot of CPU (30%), but slowly releases it by 0.5% every second, stopping at 0.4.

It also hangs when I do src/bitcoin-cli stop. Unload wallet call takes forever to execute (like more than 10 seconds, but I can wait for more if needed). I’ll just repeat that I’m happy to let you in my VM if that makes life easier 😃

From the stack trace, this is almost definitely caused by ab31b9d6fe7b39713682e3f52d11238dbe042c16 from #18338. UnloadWallet is waiting forever for the shared_ptr reference count to be released, and apparently it doesn’t happen in the Ctrl-C shutdown sequence