podman: Unable to start containers after a forced shutdown
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
Steps to reproduce the issue:
- Run the following container:
sudo /usr/bin/podman run --rm -d --name atomix-1 -p 5679:5679 -it -v /opt/onos/config:/etc/atomix/conf -v /var/lib/atomix-1/data:/var/lib/atomix/data:Z atomix/atomix:3.1.5 --config /etc/atomix/conf/atomix-1.conf --ignore-resources --data-dir /var/lib/atomix/data --log-level WARN
- Selected forced shutdown of the VM:
sudo virsh destroy <VM Name>
- Rerun the VM and start the container
Describe the results you received: Container failed to start with error readlink below: no such file or directory. This occurs approximate 1 in 5 forced shutdowns
sudo /usr/bin/podman run --rm -d --name atomix-1 -p 5679:5679 -it -v /opt/onos/config:/etc/atomix/conf -v /var/lib/atomix-1/data:/var/lib/atomix/data:Z atomix/atomix:3.1.5 --config /etc/atomix/conf/atomix-1.conf --ignore-resources --data-dir /var/lib/atomix/data --log-level WARN Error: readlink /var/lib/containers/storage/overlay/l/QRPHWAOMUOP7RQXQKPUY4Y7I3Z: no such file or directory
sudo podman inspect localhost/atomix/atomix:3.1.5 Error: error parsing image data “57ddcf43f4ac8f399810d4b44ded2c3a63e5abfb672bc447c3aa0f18e39a282c”: readlink /var/lib/containers/storage/overlay/l/GMVU2BJI2CBP6Z2DFDEHCCZGTD: no such file or directory
Describe the results you expected: Container starts correctly
Additional information you deem important (e.g. issue happens only occasionally): The only work around seems to be to delete the image and re pull:
sudo podman rm -f atomix/atomix:3.1.5 sudo podman pull atomix/atomix:3.1.5
Output of podman version:
Version: 1.9.0
RemoteAPI Version: 1
Go Version: go1.12.12
OS/Arch: linux/amd64
Output of podman info --debug:
debug:
compiler: gc
gitCommit: ""
goVersion: go1.12.12
podmanVersion: 1.9.0
host:
arch: amd64
buildahVersion: 1.14.8
cgroupVersion: v1
conmon:
package: conmon-2.0.15-2.3.el8.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.15, commit: ceb15924831eac767b6938880570e048ff787d0d'
cpus: 2
distribution:
distribution: '"centos"'
version: "8"
eventLogger: journald
hostname: tcn
idMappings:
gidmap:
- container_id: 0
host_id: 1001
size: 1
- container_id: 1
host_id: 165536
size: 65536
uidmap:
- container_id: 0
host_id: 1001
size: 1
- container_id: 1
host_id: 165536
size: 65536
kernel: 4.18.0-147.8.1.el8_1.x86_64
memFree: 1927327744
memTotal: 3964665856
ociRuntime:
name: runc
package: runc-1.0.0-15.4.el8.x86_64
path: /usr/bin/runc
version: |-
runc version 1.0.0-rc10
commit: c2df86ba3af1e210a0f9d745df96e4329e3e6808
spec: 1.0.1-dev
os: linux
rootless: true
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.0.0-4.2.el8.x86_64
version: |-
slirp4netns version 1.0.0
commit: a3be729152a33e692cd28b52f664defbf2e7810a
libslirp: 4.2.0
swapFree: 4260356096
swapTotal: 4260356096
uptime: 24m 19.52s
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- registry.centos.org
- docker.io
store:
configFile: /home/tcnbuild/.config/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: vfs
graphOptions: {}
graphRoot: /home/tcnbuild/.local/share/containers/storage
graphStatus: {}
imageStore:
number: 0
runRoot: /run/user/1001/containers
volumePath: /home/tcnbuild/.local/share/containers/storage/volumes
Package info (e.g. output of rpm -q podman or apt list podman):
podman-1.9.0-1.2.el8.x86_64
Additional environment details (AWS, VirtualBox, physical, etc.): KVM CentOS 8.1 Guest VM running latest stable podman.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 38 (28 by maintainers)
@w4tsn and @buck2202, for an immediate workaround you could setup a read-only image store and make podman stateless as described here.
Just to clarify, I’m assuming that the ro-store workaround would not be sufficient for containers run as root, since it relies on filesystem permissions, right?
I’m getting hit with this fairly often using preemptible instances on google cloud. Since I have to expect random hard shutdowns, I’m already taking container checkpoints at regular intervals (which require root). My fairly overkill workaround to the random corruption is if after boot, any
podman container inspectorpodman image inspectreturns a nonzero exitcode, I dump a list of containers,podman system reset, repull my images, and restore my container list from whatever checkpoints happen to be present.My scripts seem to catch the corruption and allow recovery, but it’s fairly aggressive.
we would need to checksum each file in the image. It would get us closer to the OSTree storage model. OSTree has a fsck operation that works this way.
Alternatively, more expensive in terms of I/O, we record the image is pulled only after we do a
syncfs().You can use an additional store that works exactly how you described it. The entire storage is on a read-only partition and tell Podman to use it with:
in the
storage.conffile