podman: [btrfs] Sporadic Found incomplete layer error results in broken container engine
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
A sporadically occurring “Found incomplete layer” error after the nighly automatic system updates on openSUSE MicroOS, results in broken podman container engine:
WARN[0000] Found incomplete layer "236fcd368394d7094f40012a131c301d615722e60b25cb459efa229a7242041b", deleting it
Error: stat /var/lib/containers/storage/btrfs/subvolumes/236fcd368394d7094f40012a131c301d615722e60b25cb459efa229a7242041b: no such file or directory
Once the error occurs, nothing works anymore. Even a podman image prune complains about the same error and fails. The only way to fix podman is to manually nuke the /var/lib/containers/storage/btrfs directory.
I’m having this issue on a MicroOS installation with the most recent podman version (4.3.1). I have a couple of container running there and this issue occurred now for the second time in a month after the automatic nighly updates. A fellow redditor confirms the issue.
The issue arises after a round of automatic updates during the night. It is unclear, if the system update or a run of podman auto-update causes the issue, I have not been able to find a reproducer yet.
Steps to reproduce the issue:
A possible reproducer can be found below
Describe the results you received:
- podman container engine broken after automatic system and container updates
Describe the results you expected:
- podman keeps working
Additional information you deem important (e.g. issue happens only occasionally):
- Issue happens only occasionally
Output of podman version:
Client: Podman Engine
Version: 4.3.1
API Version: 4.3.1
Go Version: go1.17.13
Built: Tue Nov 22 00:00:00 2022
OS/Arch: linux/amd64
Output of podman info:
host:
arch: amd64
buildahVersion: 1.28.0
cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
- rdma
- misc
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.5-2.1.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.5, commit: unknown'
cpuUtilization:
idlePercent: 98.92
systemPercent: 0.36
userPercent: 0.72
cpus: 4
distribution:
distribution: '"opensuse-microos"'
version: "20221217"
eventLogger: journald
hostname: starfury
idMappings:
gidmap: null
uidmap: null
kernel: 6.0.12-1-default
linkmode: dynamic
logDriver: journald
memFree: 309272576
memTotal: 7366852608
networkBackend: cni
ociRuntime:
name: runc
package: runc-1.1.4-2.1.x86_64
path: /usr/bin/runc
version: |-
runc version 1.1.4
commit: v1.1.4-0-ga916309fff0f
spec: 1.0.2-dev
go: go1.18.6
libseccomp: 2.5.4
os: linux
remoteSocket:
exists: true
path: /run/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /etc/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.0-1.1.x86_64
version: |-
slirp4netns version 1.2.0
commit: unknown
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 5
libseccomp: 2.5.4
swapFree: 0
swapTotal: 0
uptime: 3h 47m 14.00s (Approximately 0.12 days)
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- registry.opensuse.org
- docker.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 8
paused: 0
running: 8
stopped: 0
graphDriverName: btrfs
graphOptions: {}
graphRoot: /var/lib/containers/storage
graphRootAllocated: 26834087936
graphRootUsed: 9974857728
graphStatus:
Build Version: Btrfs v6.0.2
Library Version: "102"
imageCopyTmpDir: /var/tmp
imageStore:
number: 8
runRoot: /run/containers/storage
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 4.3.1
Built: 1669075200
BuiltTime: Tue Nov 22 00:00:00 2022
GitCommit: ""
GoVersion: go1.17.13
Os: linux
OsArch: linux/amd64
Version: 4.3.1
Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):
podman-4.3.1-1.1.x86_64
Have you tested with the latest version of Podman and have you checked Podman Troubleshooting Guide?
Yes
- Version running: 4.3.1
- I couldn’t find any related entries in the Troubleshooting Guide
Additional environment details (AWS, VirtualBox, physical, etc.):
- KVM Virtual machine running openSUSE MicroOS
- I’m using the
btrfsoverlay
A working hypothesis is that the podman auto-update gets interrupted by a system reboot, resulting in dangling (corrupted) images. On MicroOS, the transactional-updates (system updates) and the podman auto-updates start times are randomized (i.e. systemd units with RandomizedDelaySec in place), so there is the chance that the podman auto-update service gets interrupted by a system reboot. I’m running about 8 container at the host, so the vulnerable timeslot would not be negligible.
This remains a hypothesis at the moment, as I was unable yet to verify this yet.
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 3
- Comments: 15 (5 by maintainers)
Sadly we have no expertise in ZFS File system as a storage driver. We would recommend using Overlay over a ZFS lower layer.
This happens me a lot, but with ZFS, so the problem might not be in the storage, but Podman?
Same here and I fixed it by removing the reference of the layer (that doesn’t exists) in the /var/lib/containers/storage/btrfs-layers/layers.json file.
I don’t know if it’s a better way to solve it but now at least I can manage my containers without losing data.