neon: initdb_lsn is not reproducible
Pageserver, in order to create an empty timeline, runs initdb
command: https://github.com/neondatabase/neon/blob/3be3bb77302c371d1c58fda7d17dbf56dd9ad061/pageserver/src/tenant.rs#L994-L1000
After this call, we import pg data, extract initdb_lsn
from it and query safekeepers to get more WAL from compute, if there’s any, using Lsn, based on initdb_lsn
as a start lsn for these queries.
Safekeepers, on their side, do record this start lsn as an offset from where the WAL streaming should start and store WAL in the files based on this offset.
Later, if we want to restore pageserver entirely from safekeeper WAL, it’s possible to do this similar way: create an empty timeline on pageserver with the needed IDs, make it query safekeeper for more WAL that it had stored from the precious wal streaming.
That would not work, if the Lsn offset is different: safekeepers will start streaming WAL from a different offset, WAL segments’ checksums won’t match and data inside that WAL would not match the expected on pageserver’s side. Example: https://neondb.slack.com/archives/C03H1K0PGKH/p1664971119442359
Turns out, that initdb
does not produce the same output even when run from the same binaries.
Consider https://github.com/neondatabase/neon/pull/2589 PR that was built recently: it had produced neondatabase/neon:2178
image that was tagged as neondatabase/neon:latest
one after the build.
If you run things inside Docker image for it docker run -it --rm neondatabase/neon:2178 bash
, it will output the following:
neon@8478c4e9b86a:~$ env LD_LIBRARY_PATH="/usr/local/v14/lib/" env DYLD_LIBRARY_PATH="/usr/local/v14/lib/" /usr/local/v14/bin/initdb -D ./pg14-initdb/ -U test_user -E utf8 --no-instructions --no-sync
... snip, successful operation, creates `pg14-initdb` directory
neon@8478c4e9b86a:~$ /usr/local/bin/pageserver_binutils ./pg14-initdb/global/pg_control
... snip
pg_initdb_lsn: 0/1696070, aligned: 0/1696070
neon@8478c4e9b86a:~$ /usr/local/bin/pageserver_binutils --version
Neon Pageserver binutils git-env:13f0e7a5b4a2ea1187955926e036d4ac57ed094c
neon@8478c4e9b86a:~$ env LD_LIBRARY_PATH="/usr/local/v14/lib/" env DYLD_LIBRARY_PATH="/usr/local/v14/lib/" /usr/local/v14/bin/initdb --version
initdb (PostgreSQL) 14.5
root@8478c4e9b86a:/data# file /usr/local/v14/bin/initdb
/usr/local/v14/bin/initdb: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7846631cab9d142450aea3e6e8a5330944a9d291, for GNU/Linux 3.2.0, with debug_info, not stripped
Same image, after being deployed on stage-ps-2
, outputs the following:
admin@zenith-us-stage-ps-2:~$ env LD_LIBRARY_PATH="/usr/local/v14/lib/" env DYLD_LIBRARY_PATH="/usr/local/v14/lib/" /usr/local/v14/bin/initdb -D ./pg14-initdb/ -U test_user -E utf8 --no-instructions --no-sync
... snip, successful operation, creates `pg14-initdb` directory
admin@zenith-us-stage-ps-2:~$ /usr/local/bin/pageserver_binutils ./pg14-initdb/global/pg_control
... snip
pg_initdb_lsn: 0/1696068, aligned: 0/1696068
admin@zenith-us-stage-ps-2:~$ /usr/local/bin/pageserver_binutils --version
Neon Pageserver binutils git-env:13f0e7a5b4a2ea1187955926e036d4ac57ed094c
admin@zenith-us-stage-ps-2:~$ env LD_LIBRARY_PATH="/usr/local/v14/lib/" env DYLD_LIBRARY_PATH="/usr/local/v14/lib/" /usr/local/v14/bin/initdb --version
initdb (PostgreSQL) 14.5
admin@zenith-us-stage-ps-2:~$ file /usr/local/v14/bin/initdb
/usr/local/v14/bin/initdb: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7846631cab9d142450aea3e6e8a5330944a9d291, for GNU/Linux 3.2.0, with debug_info, not stripped
Binary files and build hashes match, but their pg_initdb_lsn
output is different for a couple of bytes.
With the tenant relocation and various other ways to dynamically switch environments for the timeline, at the current point, we cannot guarantee that initdb_lsn
is identical hence cannot rely that safekeeper WAL restoration will ever work now.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 19 (19 by maintainers)
Commits related to this issue
- Run initdb with the same commands across the whole storage Based on https://github.com/neondatabase/neon/pull/3276 Pageserver runs `initdb` in two places during its work: during new timeline creatio... — committed to neondatabase/neon by deleted user a year ago
- Upload initdb results to S3 (#5390) ## Problem See #2592 ## Summary of changes Compresses the results of initdb into a .tar.zst file and uploads them to S3, to enable usage in recovery from... — committed to neondatabase/neon by arpad-m 7 months ago
We can create a physical snapshot of existing databases, as of now, from the pageserver. That will allow us to recover from any issues that arise in the future, although it won’t allow you to recover to an earlier point.
The
zstd --long
option makes a big difference:And
--single-thread
saves a little too: