foundationdb: fdbrestore gets OutOfMemory error while waiting for agents to complete restore
when trying to restore the db from backup to either 5.2.5 or 6.0.15, it will end up OOM and fail. the command
fdbrestore start -r back_dir -w -C cluster_file
the output
Backup Description
URL: back_dir
Restorable: true
Snapshot: startVersion=113213694260083 (2022-08-26 02:13:19) endVersion=113213694816262 (2022-08-26 02:13:20) totalBytes=119085195 restorable=true
SnapshotBytes: 119085195
MinLogBeginVersion: 113213694201866 (2022-08-26 02:13:19)
ContiguousLogEndVersion: 113213714201866 (2022-08-26 02:13:39)
MaxLogEndVersion: 113213714201866 (2022-08-26 02:13:39)
MinRestorableVersion: 113213694816262 (2022-08-26 02:13:20)
MaxRestorableVersion: 113213714201865 (2022-08-26 02:13:39)
Restoring backup to version: 113213714201865
ERROR: Out of memory
the only difference between 5.2.5 and 6.0.15 in the output is on 6.0.15, it will keep showing the following
Tag: default UID: xxxxxxxxxxxxxxxxx State: queued Blocks: 0/0 BlocksInProgress: 0 Files: 0 BytesWritten: 0 ApplyVersionLag: 0 LastError: None
while 5.2.5 did not show anything
no obvious logs in trace files and syslog.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (8 by maintainers)
To clarify what is going on a bit:
fdbbackup
andfdbrestore
merely enqueue backup and restore jobs into the database (after a possibly length init for restore) for the database cluster’sbackup_agent
processes to make progress on when they are running. Any cluster you want to backup data from or restore data into must have at least one backup agent running. You can start backups or restores when no agents are running, but no progress will actually be made on the resulting backup/restore jobs until there are agents running. A default FDB installation will configure one backup_agent to be started byfdbmonitor
.So your
fdbrestore
command did not ignore the -C option, there is no bug there, it enqueued your restore job into the cluster you specified but there were no backup agents running on that cluster so no actually restore work was done.I’m glad your restore issue is resolved, however there is still the matter of the Out Of Memory error you saw which obviously should not happen. Can you give me any further details about that? How long was fdbrestore running before it OOM’d? Do you have a trace file from that output (the --log option will produce one)?