podman: Unable to start containers after a forced shutdown

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Steps to reproduce the issue:

  1. Run the following container:

sudo /usr/bin/podman run --rm -d --name atomix-1 -p 5679:5679 -it -v /opt/onos/config:/etc/atomix/conf -v /var/lib/atomix-1/data:/var/lib/atomix/data:Z atomix/atomix:3.1.5 --config /etc/atomix/conf/atomix-1.conf --ignore-resources --data-dir /var/lib/atomix/data --log-level WARN

  1. Selected forced shutdown of the VM:

sudo virsh destroy <VM Name>

  1. Rerun the VM and start the container

Describe the results you received: Container failed to start with error readlink below: no such file or directory. This occurs approximate 1 in 5 forced shutdowns

sudo /usr/bin/podman run --rm -d --name atomix-1 -p 5679:5679 -it -v /opt/onos/config:/etc/atomix/conf -v /var/lib/atomix-1/data:/var/lib/atomix/data:Z atomix/atomix:3.1.5 --config /etc/atomix/conf/atomix-1.conf --ignore-resources --data-dir /var/lib/atomix/data --log-level WARN Error: readlink /var/lib/containers/storage/overlay/l/QRPHWAOMUOP7RQXQKPUY4Y7I3Z: no such file or directory

sudo podman inspect localhost/atomix/atomix:3.1.5 Error: error parsing image data “57ddcf43f4ac8f399810d4b44ded2c3a63e5abfb672bc447c3aa0f18e39a282c”: readlink /var/lib/containers/storage/overlay/l/GMVU2BJI2CBP6Z2DFDEHCCZGTD: no such file or directory

Describe the results you expected: Container starts correctly

Additional information you deem important (e.g. issue happens only occasionally): The only work around seems to be to delete the image and re pull:

sudo podman rm -f atomix/atomix:3.1.5 sudo podman pull atomix/atomix:3.1.5

Output of podman version:

Version:            1.9.0
RemoteAPI Version:  1
Go Version:         go1.12.12
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  gitCommit: ""
  goVersion: go1.12.12
  podmanVersion: 1.9.0
host:
  arch: amd64
  buildahVersion: 1.14.8
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.15-2.3.el8.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.15, commit: ceb15924831eac767b6938880570e048ff787d0d'
  cpus: 2
  distribution:
    distribution: '"centos"'
    version: "8"
  eventLogger: journald
  hostname: tcn
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 4.18.0-147.8.1.el8_1.x86_64
  memFree: 1927327744
  memTotal: 3964665856
  ociRuntime:
    name: runc
    package: runc-1.0.0-15.4.el8.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10
      commit: c2df86ba3af1e210a0f9d745df96e4329e3e6808
      spec: 1.0.1-dev
  os: linux
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.0.0-4.2.el8.x86_64
    version: |-
      slirp4netns version 1.0.0
      commit: a3be729152a33e692cd28b52f664defbf2e7810a
      libslirp: 4.2.0
  swapFree: 4260356096
  swapTotal: 4260356096
  uptime: 24m 19.52s
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /home/tcnbuild/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: vfs
  graphOptions: {}
  graphRoot: /home/tcnbuild/.local/share/containers/storage
  graphStatus: {}
  imageStore:
    number: 0
  runRoot: /run/user/1001/containers
  volumePath: /home/tcnbuild/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.9.0-1.2.el8.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.): KVM CentOS 8.1 Guest VM running latest stable podman.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 38 (28 by maintainers)

Most upvoted comments

@w4tsn and @buck2202, for an immediate workaround you could setup a read-only image store and make podman stateless as described here.

Just to clarify, I’m assuming that the ro-store workaround would not be sufficient for containers run as root, since it relies on filesystem permissions, right?

I’m getting hit with this fairly often using preemptible instances on google cloud. Since I have to expect random hard shutdowns, I’m already taking container checkpoints at regular intervals (which require root). My fairly overkill workaround to the random corruption is if after boot, any podman container inspect or podman image inspect returns a nonzero exitcode, I dump a list of containers, podman system reset, repull my images, and restore my container list from whatever checkpoints happen to be present.

My scripts seem to catch the corruption and allow recovery, but it’s fairly aggressive.

How difficult would it be to reassemble the storage with an fsck option? The difference between CRI-O and Podman is blowing away of containers, could mean loss of a serious amount of work. Think toolbox containers.

we would need to checksum each file in the image. It would get us closer to the OSTree storage model. OSTree has a fsck operation that works this way.

Alternatively, more expensive in terms of I/O, we record the image is pulled only after we do a syncfs().

Does podman support any sort of read-only rootfs setup? Like storing images in a partition which gets mounted as ro? Or even the whole rootfs mounted as ro.

You can use an additional store that works exactly how you described it. The entire storage is on a read-only partition and tell Podman to use it with:

additionalimagestores = [
     "/path/to/the/storage"
]

in the storage.conf file