tendermint: rpc-server module doesn't start after a container restart

Tendermint version (use tendermint version or git rev-parse --verify HEAD if installed from source): 0.31.5-d2eab536

ABCI app (name for built-in, URL for self-written if it’s publicly available): Bigchaindb ver: 2.0.0

Environment:

  • OS (e.g. from /etc/os-release): OS: Ubuntu 18.04.4 LTS
  • Install tools: docker
  • Others: all-in-one docker build except for mongodb, which is used from outside of the container running bigchaindb and tendermint.

What happened: The EC2 instance ran out of disk. After freeing up some space, when the docker containers are restarted, bigchaindb was unable to commit any transaction. The debug log from bigchaindb said max-retries exceed to connect to http://localhost:26657 (the default tendermint port). I tried building a fresh container and run it, in that also it did not work.

What you expected to happen: rpc-server module of tendermint to be up and running.

Have you tried the latest version: no

How to reproduce it (as minimally and precisely as possible): 1) let a disk-full happen, 2) stop/start the container running bigchaindb/tendermint.

Logs (paste a small part showing an error (< 10 lines) or link a pastebin, gist, etc. containing more of the log file):

bash-5.0# tendermint node --rpc.laddr "tcp://0.0.0.0:26657" --log_level="*:debug"
I[2020-05-06|18:40:05.136] Starting multiAppConn                        module=proxy impl=multiAppConn
I[2020-05-06|18:40:05.137] Starting socketClient                        module=abci-client connection=query impl=socketClient
I[2020-05-06|18:40:05.138] Starting socketClient                        module=abci-client connection=mempool impl=socketClient
I[2020-05-06|18:40:05.139] Starting socketClient                        module=abci-client connection=consensus impl=socketClient
I[2020-05-06|18:40:05.139] Starting EventBus                            module=events impl=EventBus
I[2020-05-06|18:40:05.140] Starting PubSub                              module=pubsub impl=PubSub
I[2020-05-06|18:40:05.151] Starting IndexerService                      module=txindex impl=IndexerService
I[2020-05-06|18:40:05.231] ABCI Handshake App Info                      module=consensus height=355 hash= software-version= protocol-version=0
I[2020-05-06|18:40:05.233] ABCI Replay Blocks                           module=consensus appHeight=355 storeHeight=2760414 stateHeight=2760413
I[2020-05-06|18:40:05.233] Applying block                               module=consensus height=356
I[2020-05-06|18:40:05.315] Executed block                               module=consensus height=356 validTxs=0 invalidTxs=0
I[2020-05-06|18:40:05.395] Applying block                               module=consensus height=357

Config (you can paste only the changes you’ve made):

node command runtime flags:

/dump_consensus_state output for consensus bugs

Anything else we need to know: I see the following in tendermint log (with default log level): E[2020-05-06|17:18:40.274] abci.socketClient failed to connect to tcp://127.0.0.1:26658. Retrying… module=abci-client connection=query err=“dial tcp 127.0.0.1:26658: conn ect: connection refused”

I tried asking in SO : https://stackoverflow.com/questions/61580896/getting-error-while-sending-transaction-over-bigchaindb-node

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 25 (10 by maintainers)

Most upvoted comments

It seems like the node already has all blocks in local storage: storeHeight=2760414. So fast_sync is not involved, this is pure block replay.

When Tendermint starts up, it asks the application which height it processed last, and then starts replaying blocks from there. Why does the application report it’s at height 0? Is the application not persisting its state to disk?