zebra: Peer connections hang temporarily or fail permanently

Analysis

Peer connection failures can result in a slow sync, temporary sync hangs, or permanent sync hangs.

These failures can happen after network interruptions, or due to normal connection churn. They might also happen because Zebra’s protocol state machine gets in an invalid or unrecoverable state.

Next Steps

Here are some things we could try:

  • Run Zebra with a single local peer, and make it panic as soon as that peer disappears. It’s ok for the peer to go from ready to unready and back, and maybe disconnect, but it should never disappear.

Version

zebrad 3.0.0-alpha.0

Current main branch Edit: as of 2020-12-02

Platform Linux oxarbitrage 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description

During sync on the mainnet the sync component stopped responding. The program didn’t crashed or hanged totally as the inbound requests where still being responded however no more block downloads were made. I have a log of this at https://gist.github.com/oxarbitrage/2f067fed9588c3d942e499d1252fc777 where the last msg from the sync component is at https://gist.github.com/oxarbitrage/2f067fed9588c3d942e499d1252fc777#file-gistfile1-txt-L2843 and no further msg after 30 minutes.

I tried stooping the program and start again. The sync resumed.

– When trying to sync the zcash blockchain using zebra i will expect to download all the blocks from start to end without intervention. Instead, i had to stop(ctrl-c) the program and restart to keep going.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 21 (21 by maintainers)

Most upvoted comments

We just merged #1468, which should provide better diagnostics for this issue, and panics if our assumptions about the peer state machine don’t hold.

Yea, it seems that is what is happening. I resumed and blocks are downloaded fine, then i closed my connection in purpose, while zebra is still running. Waited like 30 seconds and connected back, zebra was never able to download more blocks long after my connection was active again.

I have some logs for this too:

https://gist.github.com/oxarbitrage/916ed4d49db39116917ad6e938555e8e#file-sync_disconnect_mainnet-txt