tendermint: High queries amount can make miss blocks

Tendermint version (use tendermint version or git rev-parse --verify HEAD if installed from source):

tendermint: ""
abci: 0.17.0
blockprotocol: 11
p2pprotocol: 8

ABCI app (name for built-in, URL for self-written if it’s publicly available): Desmos v0.15.1

Environment:

  • OS (e.g. from /etc/os-release):
    NAME="Ubuntu"
    VERSION="20.04.1 LTS (Focal Fossa)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 20.04.1 LTS"
    VERSION_ID="20.04"
    HOME_URL="https://www.ubuntu.com/"
    SUPPORT_URL="https://help.ubuntu.com/"
    BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
    PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
    VERSION_CODENAME=focal
    UBUNTU_CODENAME=focal
    
  • Install tools:
  • Others:

What happened:

Premise: this issue is a duplicate of cosmos/cosmos-sdk#8602 that I opened here after @marbar3778 suggested to do so.

We are currently developing BDJuno, a tool that allows to listen to a chain state and parses the data into a PostgreSQL database. In order to do so, it acts in two ways at the same time:

  1. Listens for new blocks
  2. Parses all old blocks

For each block, it then reads the different modules’ states and stores them inside the PostgreSQL database. What we do is we a snapshot of the state for each block and store it. To do so, we use gRPC to get all the data that can change from one block to another (i.e. delegations, unbonding delegations, redelegations, staking commissions, etc).

As we also need to parse old blocks and get the state at very old heights, we setup an archive node with pruning = "nothing".

When we first started our parser, everything was working properly. The node was able to keep up with syncing new blocks and answering to gRPC calls properly.

Recently, however, we noticed that the node started to lack behind the chain state, was over 500 blocks behind. So, we stopped the parser and let the node catch up again with the chain state. Then, we restarted the parser. One week later and the node is once again more than 1,000 blocks behind the current chain height.

Note
I have no idea if this happens only because the pruning is set to nothing. However, I believe this should be investigated as it might result in some tools (eg. explorers) making the nodes stop in the future if too many requests are done to them. It could even be exploited via a DDoS attack to validator nodes if this results to happen also to nodes that have the pruning option set to default or everything.

What you expected to happen:
The node should continue to be in sync with the chain, while responding to queries without starting to lack behind.

Have you tried the latest version:
No

How to reproduce it (as minimally and precisely as possible):

  1. Start a full node with pruning = "nothing"
  2. Start performing a lot of gRPC requests (around 100 per block)
  3. The node will start to slowly lack behind in block syncing

Logs (paste a small part showing an error (< 10 lines) or link a pastebin, gist, etc. containing more of the log file):
N.A.

Config (you can paste only the changes you’ve made):

pruning = "nothing"

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 25 (17 by maintainers)

Most upvoted comments

Yes, I’ve been running this node for 2 months without Juno querying it, and it had no problem. It always stayed in sync with the chain. The problem only popped up when I started running Juno

Did the amount of requests increase?

The amounts of requests increased when I started Juno (which was the only one making the requests). Before that the node was not getting queried at all