firecracker: [Snaps] Filesystem errors when loading a snapshot on a different machine
Describe the bug
I am hacking on firecracker-containerd to support firecracker snapshots, and I am facing the following problem.
I create a snapshot of a microVM running a nginx container, and then I try to load this snapshot on a different machine (for preserving the disk state, I simply commit a container snapshot).
When restoring the disk state, I patch (see also #4014) the snapshot’s disk device path for the container snapshot device to point to a fresh containerd snapshot mount point (the previously committed image is mounted).
Snapshot loading succeeds and the container is responsive even to http requests (I am dropping the network setup details since I don’t have problems with it now), but the nginx container returns internal server errors, and the following error appears in the microVM’s kernel logs:
DEBU[2023-08-15T07:21:33.072516371-04:00] [ 38.858355] EXT4-fs error (device vdb): ext4_find_entry:1447: inode #262: comm nginx: checksumming directory block 0 jailer=noop runtime=aws.firecracker vmID=1 vmm_stream=stdout
DEBU[2023-08-15T07:21:33.076584363-04:00] [ 38.862540] EXT4-fs error (device vdb): ext4_find_entry:1447: inode #262: comm nginx: checksumming directory block 0 jailer=noop runtime=aws.firecracker vmID=1 vmm_stream=stdout
It seems like the restored disk state (via container snapshot commit) is inconsistent with the microVM’s disk state.
In this regard, do you have any ideas of what the problem is (if it doesn’t simply come down to state disk state inconsistency) and how it could be solved? It seems like this requires a sort of a filesystem checkpoint, but I couldn’t find any relevant information on the web.
This issue is rather a question than a bug report, so I am not adding any other context (though I can if it will help clarify my use case).
Checks
- Have you searched the Firecracker Issues database for similar problems?
- Have you read the existing relevant Firecracker documentation?
- Are you certain the bug being reported is a Firecracker issue?
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (8 by maintainers)
I 'll close this now since the issue is tracked in https://github.com/firecracker-microvm/firecracker-containerd/issues/759.
Feel free to re-open if you realize the problem is with Firecracker after all.
@CuriousGeorgiy thanks a lot for the effort put in this. I think it makes sense, yes. It looks like it’s not a Firecracker problem, so the firecracker-containerd folks might be better suited to help you with this.
Feel free to move it. You can always open a new issue here if you find something that is Firecracker related.
Could you try to reproduce the same scenario without
firecracker-containerd?:And see if it reproduces. After step 4. you could also try to patch the drive to another path, so that we see if that’s the problem.