podman: power-loss while creating containers may leave podman (storage) in a broken state
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
If power-loss occurs in a small time window while creating containers with podman, the container storage is broken and no containers can be started nor created anymore. Only a podman system prune -a seems to resolve the issue while all other prune commands don’t.
Steps to reproduce the issue (maybe in general):
- Setup systemd units to create containers on boot
- Disconnect power source while containers are created on boot
- Restart and observe unit / containers
Steps to reproduce the issue (specifically):
Following are the specific steps with regards to my actual setup. This might make a difference, since the Raspberry Pi 3B+ has few resources which causes image pull and container creation to take some time (especially when starting 5 containers in parallel) which could widen the time window for corruption.
- Install latest Fedora IoT on a Raspberry Pi 3B+
- Setup some containers starting via systemd on boot, including a pod. Use unit-dependency to start the pod at first, then one container with a boot time of more than a minute (e.g. node-red) and four other containers after that
- As soon as the containers are being created after successful boot disconnect the power source
Describe the results you received:
After successful reboot after the power-loss all podman container units fail to start with the following error message:
Error: readlink /var/lib/containers/storage/overlay/l/ORYZLEWFSIV3UXAUDOB4OAH6SW: no such file or directory
Describe the results you expected:
I expect all containers to be created normally. My systemd units remove any left-over containers before attempting to create the new ones. This should work in any case, even on power loss. Podman should not enter a state where I have to manually issue a podman system prune -a or other intervention when something fails at container creation.
Additional information you deem important (e.g. issue happens only occasionally):
I’m starting 5 containers in parallel, which slows down container creation quite a bit on a raspberry Pi 3B+ which could widen a potential time window for corruption.
Output of podman version:
Version: 2.1.1
API Version: 2.0.0
Go Version: go1.14.9
Built: Wed Sep 30 21:31:36 2020
OS/Arch: linux/arm64
Output of podman info --debug:
host:
arch: arm64
buildahVersion: 1.16.1
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.0.21-2.fc32.aarch64
path: /usr/bin/conmon
version: 'conmon version 2.0.21, commit: 5c1a09d48bd2b912c29efe00ec956c8f84ae26b9'
cpus: 4
distribution:
distribution: fedora
version: "32"
eventLogger: journald
hostname: localhost
idMappings:
gidmap: null
uidmap: null
kernel: 5.8.13-200.fc32.aarch64
linkmode: dynamic
memFree: 11911168
memTotal: 981143552
ociRuntime:
name: crun
package: crun-0.15-5.fc32.aarch64
path: /usr/bin/crun
version: |-
crun version 0.15
commit: 56ca95e61639510c7dbd39ff512f80f626404969
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
remoteSocket:
path: /run/podman/podman.sock
rootless: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 370003968
swapTotal: 466997248
uptime: 3h 14m 42.71s (Approximately 0.12 days)
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- registry.centos.org
- docker.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 6
paused: 0
running: 6
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /var/lib/containers/storage
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "true"
imageStore:
number: 6
runRoot: /var/run/containers/storage
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 2.0.0
Built: 1601494296
BuiltTime: Wed Sep 30 21:31:36 2020
GitCommit: ""
GoVersion: go1.14.9
OsArch: linux/arm64
Version: 2.1.1
Package info (e.g. output of rpm -q podman or apt list podman):
podman-2.1.1-7.fc32.aarch64
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
I’m using the aarch64 variant on a Raspberry Pi 3B+ (limited resources) running Fedora IoT 32. The containers are created automatically on boot via systemd units. The units first try to remove any existing container via optional command and then run a podman container command with --systemd flag.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 21 (16 by maintainers)
Is there a way to workaround this broken state without clearing the podman storage with
system prune -a?I’ve several deployments of podman in the field on a low-bandwith or cost-by-byte connection and would like to keep downloaded images and still recover from this broken state. Any ideas?
EDIT:
Might be related to #5986 - at least there seems to be a valid work-around using read-only fs: https://github.com/containers/podman/issues/5986#issuecomment-716376419