bitcoin: Do not crash if peers.dat is corrupted
When peers.dat
is corrupted an error message is shown: Invalid or corrupt peers.dat (Checksum mismatch, data corrupted).
then the node restart.
Most of our users aren’t really tech enough to manually delete the peers.dat
files, nor can we detect it for them. It means that this error give us lot’s of work on our support team when somebody is impacted.
peers.dat
isn’t an essential file, as such Bitcoin Core should just be fine restarting without crashing.
A bash workaround to detect the checksum mismatch would also considerably help us.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 37 (27 by maintainers)
Commits related to this issue
- If peers data is corrupted, move it. (https://github.com/bitcoin/bitcoin/issues/26599) — committed to btcpayserver/dockerfile-deps by NicolasDorier a year ago
- If peers data is corrupted, move it. (https://github.com/bitcoin/bitcoin/issues/26599) — committed to btcpayserver/dockerfile-deps by NicolasDorier a year ago
- If peers data is corrupted, move it. (https://github.com/bitcoin/bitcoin/issues/26599) — committed to btcpayserver/dockerfile-deps by NicolasDorier a year ago
- Merge bitcoin/bitcoin#26909: net: prevent peers.dat corruptions by only serializing once 5eabb61b2386d00e93e6bbb2f493a56d1b326ad9 addrdb: Only call Serialize() once (Martin Zumsande) da6c7aeca38e1d0a... — committed to bitcoin-core/gui by deleted user a year ago
- Merge bitcoin/bitcoin#26909: net: prevent peers.dat corruptions by only serializing once 5eabb61b2386d00e93e6bbb2f493a56d1b326ad9 addrdb: Only call Serialize() once (Martin Zumsande) da6c7aeca38e1d0a... — committed to syscoin/syscoin by deleted user a year ago
I believe I’ve found the bug that caused this with the help of the provided
peers.dat
(which was completely ok as far as I can see, just that the checksum was wrong, and when overwriting the bad checksum with the correct one it would load correctly):Every 15 minutes, the scheduler thread will dump peers.dat to disk - for this it calls https://github.com/bitcoin/bitcoin/blob/f4ef856375c5b295d78169b136c6aee928c19bc9/src/addrdb.cpp#L38-L40
which first writes the data (i.e. AddrMan) into the stream, and then writes the same data into a hasher - which then provides the hash that is added to the stream in the third line. The problem is that AddrMan can change in between the first two calls (e.g. if we receive a new address), and then the data and hash won’t match anymore and the written file is corrupt.
I could reproduce this by adding a sleep for the scheduler thread in between the two writes of
data
, manually adding artificial addresses withaddpeeraddress
during this sleep, and then killing bitcoind (so that it can’t correct the peers.dat at a clean shutdown). That way, I would corrupt my own peers.dat.I will work on a fix!
The
peers.dat
file is designed to avoid having to reach out to the DNS or hardcoded seeds more than once, as this is the moment your node is most susceptible to being poisoned with attacker ip addresses and perhaps in the future blocks and transactions.If the file becomes corrupted then
anchors.dat
should help protect the node from a successful future eclipse attack, but new addresses will have to either be added manually or fetched from DNS or hardcoded seeds again.I agree that the best course of action here is to find out what’s corrupting
peers.dat
and fix that, rather than have Core silently ignore errors on something that could be used as a first step towards eclipse attacking you…Side note: it does make me wonder whether it could be worth having certain runtime “profiles”. For example I have seen software with “paranoia level” settings, and we could perhaps have something like
n
blocks etc.I will ask for the next time it happens to save the
peers.dat
so we can analyze it.Do you happen to know why it corrupts? If it is due to hardware error, it might be scary to just continue, because it might also corrupt wallet.dat.
@beeduul do you want to follow up with a new issue, providing more info if possible? Assuming this isn’t a hardware related problem.
To be clear: the fix doesn’t prevent any crashes from happening - what it fixes is that if the node crashes for some unrelated reason,
peers.dat
shouldn’t get corrupted anymore (which would only be visible at the next startup). So if your node crashes every few days, it sounds like you have another, unrelated problem.I opened #26909 to fix this.
To me it sounds like a bug that should be fixed and not silently ignored