chia-blockchain: [BUG] `pragma synchronous=OFF` causes database corruption on crash
Version 1.2.8/9 began using pragma synchronous=OFF every time a connection to the sqlite database is opened. This is unsafe if the machine crashes and will result in users having to resync the blockchain/wallet from scratch. This can cause users to lose days of farming time as they wait for the database to resync as well as waste network bandwidth and possibly waste data on their network plan with a data cap. The sqlite page for pragmas details why this is unsafe (sorry, you’ll have to search for synchronous as I can’t link directly to it). Namely, it does not wait for the kernel to notify sqlite that data is persisted, only that it has been submitted to the kernel.
To Reproduce
- launch any chia component that writes data to the sqlite database (full node, wallet), preferably something that needs to sync
- wait for chia to begin syncing data
- randomly unplug/hard shutdown the machine without waiting for chia to properly stop
- restart chia, database may be reported as corrupted
As the above depends on the precise timing of when the machine was powered off/crashed relative to I/O submitted by sqlite, this may require several attempts to see a corrupted database.
Expected behavior Database should not become corrupted if the machine crashes. Users should not have to resync the entire blockchain and their wallet if their machine crashes.
Desktop Any machine that writes data to any sort of storage device
Additional context While this doesn’t directly explain the numerous database corruption issues seen by people recently (including some in #8694 as those were not caused by a computer crash), there’s a good chance that either this, or this combined with some transaction management issue could cause database corruption if chia is stopped improperly.
The changelog for 1.2.8 indicates that this pragma change was added to improve disk performance. If faster transaction commits are desired, multiple operations should be batched into a single transaction instead of completely disabling the crash consistency mechanism of the database. Since sqlite is run in WAL mode, sqlite could be set to use pragma synchronous=NORMAL which the documentation seems to indicate would provide sufficient crash consistency while being faster than pragma synchronous=FULL
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 6
- Comments: 35 (9 by maintainers)
“Catastrophic” is a very subjective view. Sure, the data can be fetched from the blockchain again, but that ignores the time lost actually farming and attempting to either submit blocks to the chain or submit partials on a pool. A pooler in the pool I’m a part of had their database trashed and missed out on a lucky day yesterday, I’d say that’s unpleasant/unfortunate at best. The whole “just resync the chain again” also still ignores the fact that the database is big and will keep getting bigger. I don’t want to have to downloads 10s-100s GB of blockchain data on a semi-regular basis just because something went wrong with my power for a second.
As for low-end drives having extra caching, sqlite is built to take care of most of that already. The WAL in sqlite is a standard way to handle crash consistency on devices that have an atomic powerfail write unit that is smaller than the total data that needs to be written for a transaction. There’s
fsyncs in sqlite to ensure the WAL is actually persisted before it starts mucking with the actual transaction execution so that it can recover if things go wrong. The only time this would fail on a low-end device is if the USB controller flat-out lied about doing anfsync. In that case, the person should probably get a better device as it’s not just sqlite that would have problems in the event of a crash. Drives themselves have gotten much better at actually honoring the low-level commands thatfsynctriggers, so if there’s a problem, it’s going to be the USB controller, not the drive itself.For problematic setups, I think a simple question would be: is
synchronous=OFFsafe when Windows update decides to restart a machine randomly? Apart from that, you’ll end up with a lot of disparate answers as something as simple as a BSOD or power failure can break the database. There does not seem to be one single “right” hardware that people in the community use to run chia, so I don’t think there will be many patterns in this.As far as heuristics, the bulk of it could probably be covered by possibly running with
synchronous=OFFuntil 1/2 or 3/4 of the blockchain is synced. After that just run it withsynchronous=NORMAL.synchronous=OFFis most beneficial when there’s lots of transactions going into the database anyway, which is mostly caused by the initial sync. If normal operation also causes load, then like I said earlier, other routes like better transaction management should be used to increase performance.With the technical stuff aside, this is absolutely be a config option if the chia team wants to keep pushing for the use of
synchronous=OFFso that those who don’t want to worry about it don’t have to patch the codebase every version. From a UX perspective, the default should be some crash-consistent option likesynchronous=NORMALbecause many people 1. won’t change the defaults, 2. probably won’t have a good enough understanding of the intimate workings of storage to know whysynchronous=OFFis unsafe, and 3. will be very unhappy/may leave the project if their database files appear to randomly corrupt themselves. All of the above reasons are the same for why all semi-recent major file systems run with some sort of crash consistency mechanism by default. It turns out that people are really unhappy when they have to runfsckon their multi-TB drive due to a crash or they find they can no longer boot their machine due to a crash.As a user, I think that sounds great! My humble suggestion would be to default to the safest option (SQLite docs seems to indicate ‘normal’ would be totally safe), whilst allowing advanced users to ‘take off the seatbelt’ if they choose to go faster (perhaps in lieu of – or complementing an ‘auto’ setting). That way the risk is acknowledged and adverse outcomes can be expected.
On the general performance front there may be opportunity to SQL-tune the INSERT statement method for significant gains which could make up the difference too. Some preliminary glances at coin_store.py code show individual INSERT statements are being executed; whereas batch of those could (should?) be wrapped into a transaction and executed instead. There is some in-depth analysis here on SQLite INSERT performance tricks, but that’s probably best dropped in different GH Issue altogether.
I appreciate the trade-offs between reliability and performance, but seeing the impact of bad data hitting mempool in testnet is this definitely production ready?
Any change that decreases stability needs to be matched with safety mechanisms:
yea, no… as someone who’s spent quite a bit of time on crash consistency stuff, turning off things like this is pretty much always more trouble than it’s worth. There’s so many additionally edge and error cases that need to be checked because you basically can’t assume that any of the data you have is correct anymore. One would hope that the data for transaction n - 1 is safe because it’s not in transaction n and was committed before transaction n was and the power failure happened when committing transaction n. But if they were both written close to each other (i.e. within the default 5min flush window for LInux), that may actually not be the case…
It’s also quite possible that you don’t see as many reports because people may reach out to pool chats (assuming they are pooling) instead of the channels that y’all monitor. At least in the pool discord I’m a part of, we know about this, and we’ll just tell people the unfortunate news that they need to sync from scratch if their database is corrupted
Saying that this was done purely for performance reasons and then ignoring or writing off all the problems it is causing is frankly disappointing. I know that the chia team has been working hard to improve performance, but that doesn’t mean that it’s alright to throw caution to the wind. As a software developer as well, I know that writing good, performant code is difficult, especially with a codebase this size. As a software developer who has done lots of work with crash consistency and who understands quite well the performance tradeoffs that crash consistency causes, I really can’t buy the argument that an insane performance increase is worth a decent chance of corrupting the entire database if a crash happens. Sure, running anything without crash consistency will make it faster. But that doesn’t mean that you should run all of your computers without ext4 journaling or the equivalent, because otherwise you’re going to have lots of problems.
Apart from that, this update unfortunately also has a good possibility of excluding users who do not have stable power delivery as they cannot guarantee that their machine will not crash at the wrong time. They will be stuck in an endless loop of trying to sync the blockchain fresh, their machine losing power and crashing, and then having to start over once again.
If the chia devs would still like to allow the use of running in the new sqlite mode, adding a config flag and making it default to a crash consistent pragma statement would suffice. That would make sure the majority of users didn’t have to deal with random database corruption and those that knew what they were doing or wanted the best performance could change it. If the chia devs don’t have time for this then I’ll find the time to make a PR for it myself