bitcoin: salvagewallet fails verification

This is somewhat rare but still worth to take a look sometime. I can reproduce locally and travis complains in wallet.py:

check -salvagewallet

start_node: bitcoind started, calling bitcoin-cli -rpcwait getblockcount

No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.

The build has been terminated

Edit:

This will now show up as

check -salvagewallet
Unexpected exception caught during testing: Exception('bitcoind exited with status 1 during initialization',)

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 15 (15 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve been digging into this failure over the last couple of days. This is what I’ve found so far:

  • This is fairly easy to reproduce. Just uncomment the -salvagewallet test and set BASE_SCRIPTS in test_runner.py to be a list [["wallet.py"] * 50]. In a run of 50, I usually get one failure.

  • the problem is caused by running Berkeley DB’s Db::verify() with the DB_AGGRESSIVE flag. The documentation for that is (emphasis mine):

Output all the key/data pairs in the file that can be found. By default, Db::verify() does not assume corruption. For example, if a key/data pair on a page is marked as deleted, it is not then written to the output file. When DB_AGGRESSIVE is specified, corruption is assumed, and any key/data pair that can be found is written. In this case, key/data pairs that are corrupted or have been deleted may appear in the output (even if the file being salvaged is in no way corrupt), and the output will almost certainly require editing before being loaded into a database.

(from the BDB 4.8.30 docs)

  • Running DB::verify() with agressive set causes the output to be corrupt (more details below). I’ve tried this with the most recent version of Berkeley DB from http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/downloads/index.html and an aggressive verify still corrupts the database.

  • you can repro the corruption manually by running the db_dump utility function, which is bundled with BDB. -r runs salvage and -R runs salvage with aggressive.

  • This is the wallet.dat file that I was using in my tests (renamed wallet.dat.txt as github has rules about file types): wallet.dat.txt. If I run db_dump on this .dat file with -R, the output ends:

/tmp/user/1000/bitcoin_test_runner_20170605_153749/wallet_273/node0/regtest/wallet.1496677614.bak: BDB0090 DB_VERIFY_BAD: Database verification failed
  • if I run db_dump with -r, then I don’t see the DB_VERIFY_BAD message.

  • piping db_dump through to db_load rebuilds a database. If I use -R on db_dump, then this fails, because the database dump is corrupt. If I use -r, on db_dump, then it succeeds, and most (all?) of the key-value pairs appear to be intact.

  • Digging a bit more into the corruption, if I look at the output of db_dump -R, it’s a superset of the output of db_dump -r. There are some extra rows that look like normal keys/values, but there are also a bunch of these:

...
 554e4b4e4f574e5f4b4559
 020000000183365de8669a47426115d8224516599ce66f9e8dd75738e30b1c1812cc65210d010000006b483045022100fbec000102000000010000000000000000000000000000000000000000000000000000000000000000ffffffff0502d8000101ffffffff0200f902950000000023210230734f482295f0361adc9fdd518a17ce3914e30c5944e266c9c2d62a81bfabc0ac0000000000000000266a24aa21a9ede2f61c3f71d1defd3fa999dfa36953755c690689799962b48bebd836974e8cf900000000717b8547d59a8248a8f337f18b7ff9740c60ad7661a034534a3dc8b5ca85cb75000000000000030b66726f6d6163636f756e7400016e0232320974696d65736d6172740a313439363637373630380000000000e87c35590000732300010274783e056c621538aad4f361910ae102f93a9ca6b669
 554e4b4e4f574e5f4b4559
 02000000010817df69d135686315416c5c2ab83d740882b5c733a2e24762dc4a1b52640e2000000000494830450221008732d20ce6ec67b39bbd6065149d66cd15fa10b1666512d2f72c315dadb4a5a802202de2365122b8f014994bdd8a98955f7ade117a1aadebd2b7769d1325f53c15e801ffffffff0200000000000000001976a914954ff91555b588a81796e3d1ae6313339ce28e4788acc0e4022a010000001976a914c58fb7d7c9407cf2eab0a7a52677869334c2c46488ac00000000ccf2bff45a3b0101020000000106e45a28096dd10e52d073e6a93ed5ae884aa1dc69fb4106df2656396da5d75f000000006b483045022100ede821e3abae05f5b94454bfe6e3cdf78525206355929b3bdc1fd11b5dc8d59102
 554e4b4e4f574e5f4b4559
...
 554e4b4e4f574e5f4b4559
 0a000000c97c3559000000000a2f00818c490200e87c3559000000002105000104f09d85a1280001046e61
...
 554e4b4e4f574e5f4b4559
 076b65796d6574612102548265736eb2677feaad5caefa4996b5316da452262c03f87d5acdc00ab55ee2
 554e4b4e4f574e5f4b4559
 046e616d65226d74626e675a47434359686354435333537a365a7a6e76704c355739776439425075
 554e4b4e4f574e5f4b4559
 04706f6f6c0900000000000000

554e4b4e4f574e5f4b4559 is ascii for UNKNOWN_KEY. You can see where that string is output in btree/bt_verify.c in the BDB source.

  • Critically, in the first group, we have (with my commentary):
 554e4b4e4f574e5f4b4559                // UNKNOWN_KEY
 020000000183365de8669a47426115d8224516599ce66f9e8dd75738e30b1c1812cc65210d010000006b483045022100fbec000102000000010000000000000000000000000000000000000000000000000000000000000000ffffffff0502d8000101ffffffff0200f902950000000023210230734f482295f0361adc9fdd518a17ce3914e30c5944e266c9c2d62a81bfabc0ac0000000000000000266a24aa21a9ede2f61c3f71d1defd3fa999dfa36953755c690689799962b48bebd836974e8cf900000000717b8547d59a8248a8f337f18b7ff9740c60ad7661a034534a3dc8b5ca85cb75000000000000030b66726f6d6163636f756e7400016e0232320974696d65736d6172740a313439363637373630380000000000e87c35590000732300010274783e056c621538aad4f361910ae102f93a9ca6b669                // some value
 554e4b4e4f574e5f4b4559               // UNKNOWN_KEY
 02000000010817df69d135686315416c5c2ab83d740882b5c733a2e24762dc4a1b52640e2000000000494830450221008732d20ce6ec67b39bbd6065149d66cd15fa10b1666512d2f72c315dadb4a5a802202de2365122b8f014994bdd8a98955f7ade117a1aadebd2b7769d1325f53c15e801ffffffff0200000000000000001976a914954ff91555b588a81796e3d1ae6313339ce28e4788acc0e4022a010000001976a914c58fb7d7c9407cf2eab0a7a52677869334c2c46488ac00000000ccf2bff45a3b0101020000000106e45a28096dd10e52d073e6a93ed5ae884aa1dc69fb4106df2656396da5d75f000000006b483045022100ede821e3abae05f5b94454bfe6e3cdf78525206355929b3bdc1fd11b5dc8d59102                 // some value
 554e4b4e4f574e5f4b4559                      // UNKNOWN_KEY
 036b65792103c58cdbfc2b4cedd96e56d2714aa980a1478feb0bc5f9f7b6036e075eb974faef                 // bitcoin pub key key
 d63081d302010104202941b6af510f195ac980a8d5218d06b870d18ba057a2e8b3f63a9eb9be6d9cf7a08185308182020101302c06072a8648ce3d0101022100fffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f300604010004010704210279be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798022100fffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141020101a12403220003c58cdbfc2b4cedd96e56d2714aa980a1478feb0bc5f9f7b6036e075eb974faef60d62c885546d4410667ec2ce0cf0562b90fa067e6a8cd3dc20f127d659300d1                      // bitcoin pubkey value
 036b65792103c9c89b90f2ae69722c1ec7cea861a0c925bce1dfd996467aa7eb3fe887450655
 d63081d302010104204b1b5c9cc37717ac4ed9f4376fa93bea36e8f065d4ea0f9991eec9905e916ecba08185308182020101302c06072a8648ce3d0101022100fffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f300604010004010704210279be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798022100fffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141020101a12403220003c9c89b90f2ae69722c1ec7cea861a0c925bce1dfd996467aa7eb3fe887450655b92751b48a09b2bff3ebf0ca0d40af28c2d3aa229711abd6967a5fa6c43d61c7

the output format for BDB should be key in one row followed by value in the next row. You can see here that the final 554e4b4e4f574e5f4b4559 key is followed by 036b65792103c58cdbfc2b4cedd96e56d2714aa980a1478feb0bc5f9f7b6036e075eb974faef, which is actually another key row (for a bitcoin public key). That means that the output of db_dump is out of alignment - there’s actually an odd number of rows, so after this line, parsing keys/values gets mixed up.

  • I added some logging into ReadKeyValue() in bitcoind and see exactly the same thing when running with -salvagewallet. The key/values get misaligned when re-reading after calling Db::verify():
2017-06-05 19:57:03 Renamed wallet.dat to wallet.1496692623.bak
2017-06-05 19:57:03 CDBEnv::Salvage: Database salvage found errors, all data may not be recoverable.
2017-06-05 19:57:03 CDBEnv::Salvage: WARNING: Number of keys in data does not match number of values.
2017-06-05 19:57:03 CDBEnv::Salvage: WARNING: Unexpected end of file while reading salvage output.
2017-06-05 19:57:03 Salvage(aggressive) found 102 records
2017-06-05 19:57:03 ReadKeyValue: Reading record type tx
2017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: aa2f899c1b2fcf578522467a4d36d46b43d2ba318d9d3c17ddf0679630e67d02
2017-06-05 19:57:03 ReadKeyValue: Reading record type tx
2017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: 5fd7a56d395626df0641fb69dca14a88aed53ea9e673d0520ed16d09285ae406
2017-06-05 19:57:03 ReadKeyValue: Reading record type tx
2017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: 2f66a28ccfbbf6fccad3d168cbef6ffd36ed14d2f7303ca797708aebf3deb510
2017-06-05 19:57:03 ReadKeyValue: Reading record type tx
2017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: aada45522a31c27cab548975be7f5282471f484258451916bcad018b694dc311
2017-06-05 19:57:03 ReadKeyValue: Reading record type tx
2017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: 580ce8c795c5e3c5eaa946da7c054e6cd6336739c730c4fb87c9fa735632de15
2017-06-05 19:57:03 ReadKeyValue: Reading record type tx
2017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: ddae1c851b2275a9bb12d53bfe8a1f0ff9a7cdb149d8734042fe439e72cdd420
2017-06-05 19:57:03 ReadKeyValue: Reading record type tx
2017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: f3450597d463df2edd39cd7f0f0aa6e07801e143c2ecb3a96f33ba62bc1aa429
2017-06-05 19:57:03 ReadKeyValue: Reading record type tx
2017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: 31f5262195fd40bce453e44d69b6a69c3af902e10a9161f3d4aa3815626c053e
2017-06-05 19:57:03 ReadKeyValue: Reading record type tx
2017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: b6aa0e873ad8854e15855f9caf2a932edef07bfb5a261172351959defd83144d
2017-06-05 19:57:03 ReadKeyValue: Reading record type 0<81><D3>^B^A^A^D )A<B6><AF>Q^O^YZɀ<A8><D5>!<8D>^F<B8>pы<A0>W<A2><E8><B3><F6>:<9E><B9><BE>m<9C>����0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^CŌ<DB><FC>+L<ED><D9>nV<D2>qJ<A9><80><A1>G<8F><EB>^K<C5><F9><F7><B6>^Cn^G^<B9>t<FA><EF>
2017-06-05 19:57:03 ReadKeyValue: Reading 0<81><D3>^B^A^A^D )A<B6><AF>Q^O^YZɀ<A8><D5>!<8D>^F<B8>pы<A0>W<A2><E8><B3><F6>:<9E><B9><BE>m<9C>����0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^CŌ<DB><FC>+L<ED>
<D9>nV<D2>qJ<A9><80><A1>G<8F><EB>^K<C5><F9><F7><B6>^Cn^G^<B9>t<FA><EF> record (no-op)
2017-06-05 19:57:03 ReadKeyValue: Reading record type 0<81><D3>^B^A^A^D KESC\<9C><C3>w^W<AC>N<D9><F4>7o<A9>;<EA>6<E8><F0>e<D4><EA>^O<99><91><EE>ɐ^<91>nˠ<81><85>0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
<FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^C<C9>ț<90><F2><AE>ir,^^<C7>Ψa<A0><C9>%<BC><E1><DF>ٖFz<A7><EB>?<E8><87>E^FU
2017-06-05 19:57:03 ReadKeyValue: Reading 0<81><D3>^B^A^A^D KESC\<9C><C3>w^W<AC>N<D9><F4>7o<A9>;<EA>6<E8><F0>e<D4><EA>^O<99><91><EE>ɐ^<91>nˠ<81><85>0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
<FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^C<C9>ț<90><F2><AE>ir,^^<C7>Ψa<A0><C9>%<BC><E1><DF>ٖFz<A7><EB>?<E8><87>E^FU record (no-op)
2017-06-05 19:57:03 ReadKeyValue: Reading record type 0<81><D3>^B^A^A^D u<9C><E8><AB><F4>4D7<F2><D4>i<A9>uN<F7>3<C1>^HW<8C>0<B9><F2>ݑ<B4>^YN<FE><E0>@W<A0><81><85>0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^C<DE>u^XN"m<81><B8><9E>p<97><C4>5a<CA><C6>^V<A9>*<B6>p<99><AA><A5><B5><DC><D3>>N`(^Z
2017-06-05 19:57:03 ReadKeyValue: Reading 0<81><D3>^B^A^A^D u<9C><E8><AB><F4>4D7<F2><D4>i<A9>uN<F7>3<C1>^HW<8C>0<B9><F2>ݑ<B4>^YN<FE><E0>@W<A0><81><85>0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^C<DE>u^XN"m<81><B8><9E>p<97><C4>5a<CA><C6>^V<A9>*<B6>p<99><AA><A5><B5><DC><D3>>N`(^Z record (no-op)
2017-06-05 19:57:03 ReadKeyValue: Reading record type test
2017-06-05 19:57:03 ReadKeyValue: Reading test record (no-op)
2017-06-05 19:57:03 ReadKeyValue: Reading record type 
2017-06-05 19:57:03 ReadKeyValue: Reading  record (no-op)
2017-06-05 19:57:03 ReadKeyValue: Reading record type рыба
2017-06-05 19:57:03 ReadKeyValue: Reading рыба record (no-op)
2017-06-05 19:57:03 ReadKeyValue: Reading record type 
2017-06-05 19:57:03 ReadKeyValue: Reading  record (no-op)
2017-06-05 19:57:03 ReadKeyValue: Reading record type 
2017-06-05 19:57:03 ReadKeyValue: Reading  record (no-op)
2017-06-05 19:57:03 ReadKeyValue: Reading record type 𝅘𝅥𝅯
2017-06-05 19:57:03 ReadKeyValue: Reading 𝅘𝅥𝅯 record (no-op)
2017-06-05 19:57:03 ReadKeyValue: Reading record type ^@^@
2017-06-05 19:57:03 ReadKeyValue: Reading ^@^@ record (no-op)

(you can see at the end that it’s reading keys as values and vice versa).

After running with -salvagewallet, the wallet.dat file is empty. All key-value pairs have gone. The wallet backup file wallet.<timestamp>.bak contains the db from before the salvage operation so still has all the key-value pairs.

Next steps:

  • -salvagewallet should be changed to not set the aggressive flag. Note the documentation for that flag: the output will almost certainly require editing before being loaded into a database. Currently the salvagewallet operation just directly reads that output, so it’s not entirely surprising that it occasionally fails.
  • we may want an option to run salvage wallet with the aggressive flag set, but that should only be once we have the bitcoin_wallet_tool from #8745.
  • once #8745 is merged, we should add targeted testing of salvagewallet (rather then the incidental testing we have in wallet.py). Gavin had some testing here: https://github.com/bitcoin/bitcoin/pull/1895 / https://gist.github.com/gavinandresen/3812689 which we might use as a start.
  • TBD whether we want to merge my extra logging on BDB reads. We could also add extra logging on writes.
  • we should try to figure out what BD writes are causing the DB to get into a state where Db::verify() with the DB_AGGRESSIVE flag fails. Additional logging will help there. I can also add an option to the integration tests which verifies the wallet.dat files of every node at the end of every test.

You can just run

$ for i in {0..10000}; do qa/pull-tester/rpc-tests.py wallet --nocleanup;if [ $? -ne 0 ];then echo; echo "fail"; sleep 1d;fi ;done

and come back 20 minutes later to dig through the dumps.