podman: `podman stop` returns before removing a container started with `--rm`
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
After podman stop returns after removing a container, there are still processes running in the background responsible for removing the container if it was started with --rm. In combination with systemd, this can cause iptables NAT rules to be left behind. This is especially bad when ports are published/forwarded (not demonstrated below), which causes DNAT rules to be added. When these are left behind, the host port is then unusable for future containers because the original rule takes precedence.
Steps to reproduce the issue:
# iptables -t nat -S >clean.iptables
# cat >/etc/systemd/system/nattest.service
[Unit]
Description=Podman NAT Test
[Service]
ExecStart=podman run --rm --name nattest busybox nc -l
ExecStop=podman stop nattest
# systemctl start nattest
# diff clean.iptbales <(iptables -t nat -S) | grep '^[<>]'
> -N CNI-f891eda7ae68318dc3584dd8
> -A POSTROUTING -s 10.88.0.70/32 -m comment --comment "name: \"podman\" id: \"86cc7521c68b20a7c9f97dd4cd176c41e13008f986736be8056aca8cbb28dab8\"" -j CNI-f891eda7ae68318dc3584dd8
> -A CNI-f891eda7ae68318dc3584dd8 -d 10.88.0.0/16 -m comment --comment "name: \"podman\" id: \"86cc7521c68b20a7c9f97dd4cd176c41e13008f986736be8056aca8cbb28dab8\"" -j ACCEPT
> -A CNI-f891eda7ae68318dc3584dd8 ! -d 224.0.0.0/4 -m comment --comment "name: \"podman\" id: \"86cc7521c68b20a7c9f97dd4cd176c41e13008f986736be8056aca8cbb28dab8\"" -j MASQUERADE
# systemctl stop nattest
# diff clean.iptbales <(iptables -t nat -S) | grep '^[<>]'
> -N CNI-3b9afa83adb3440af4a15557
> -A POSTROUTING -s 10.88.0.71/32 -m comment --comment "name: \"podman\" id: \"74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e\"" -j CNI-3b9afa83adb3440af4a15557
> -A CNI-3b9afa83adb3440af4a15557 -d 10.88.0.0/16 -m comment --comment "name: \"podman\" id: \"74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e\"" -j ACCEPT
> -A CNI-3b9afa83adb3440af4a15557 ! -d 224.0.0.0/4 -m comment --comment "name: \"podman\" id: \"74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e\"" -j MASQUERADE
# journalctl -u nattest
Sep 01 06:41:42 systemd[1]: Started Podman NAT Test.
Sep 01 06:41:43 podman[50324]: 2021-09-01 06:41:43.027964497 +0200 CEST m=+0.161186034 image pull busybox
Sep 01 06:41:43 podman[50324]: 2021-09-01 06:41:43.251304873 +0200 CEST m=+0.384526468 container create 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 06:41:43 podman[50324]: 2021-09-01 06:41:43.746840867 +0200 CEST m=+0.880062381 container init 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 06:41:43 podman[50324]: 2021-09-01 06:41:43.796867219 +0200 CEST m=+0.930088726 container start 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 06:41:43 podman[50324]: 2021-09-01 06:41:43.797241532 +0200 CEST m=+0.930463052 container attach 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 06:41:47 systemd[1]: Stopping Podman NAT Test...
Sep 01 06:41:47 podman[50324]: punt!
Sep 01 06:41:57 podman[50440]: 2021-09-01 06:41:57.658659241 +0200 CEST m=+10.271143781 container stop 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 06:41:57 podman[50440]: 2021-09-01 06:41:57.728147729 +0200 CEST m=+10.340632303 container died 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 06:41:57 podman[50440]: nattest
Sep 01 06:41:57 podman[50324]: time="2021-09-01T06:41:57+02:00" level=error msg="unable to cleanup network for container 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e: \"error tearing down CNI namespace configuration for container 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e: error removing pod nattest_nattest from CNI network \\\"podman\\\": netplugin failed with no error message: signal: terminated\""
Sep 01 06:41:58 podman[50324]: 2021-09-01 06:41:58.160361638 +0200 CEST m=+15.293583198 container remove 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 06:41:58 podman[50324]: time="2021-09-01T06:41:58+02:00" level=error msg="Error forwarding signal 15 to container 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e: container has already been removed"
Sep 01 06:41:58 systemd[1]: nattest.service: Deactivated successfully.
Sep 01 06:41:58 systemd[1]: Stopped Podman NAT Test.
Describe the results you received:
Podman failed to remove the iptables NAT rules.
There is an error message in the log:
unable to cleanup network for container 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e: “error tearing down CNI namespace configuration for container 74ff59b9b635231ebd12c12ae0f318e83166a6de8b653ad5e965dd7bdec7e72e: error removing pod nattest_nattest from CNI network \“podman\”: netplugin failed with no error message: signal: terminated”
The process responsible for the clean-up was presumably killed by systemd after docker stop returned.
Describe the results you expected:
I expected no error message and the NAT rules to be back to the same state as they were before starting the service.
Additional information you deem important (e.g. issue happens only occasionally):
At first I thought this was related to #11324 but after that one was fixed in Podman 3.3, I determined this is a separate issue. The symptoms are the same in my case.
The manifestation of the bug is a little random because it is based on a race condition. It is reproducible most of the time using the steps above.
Two possible work-arounds are to add either another ExecStop=sleep 1 or KillMode=none to the service definition. The first one gives Podman another second before systemd starts killing remaining processes. The latter disables killing altogether but is not a recommended option.
# cat >/etc/systemd/system/nattest-nokill.service
[Unit]
Description=Podman NAT Test
[Service]
ExecStart=podman run --rm --name nattest busybox nc -l
ExecStop=podman stop nattest
KillMode=none
# systemctl start nattest-nokill
# diff softshell.iptables <(iptables -t nat -S) | grep '^[<>]'
> -N CNI-170407091e6f9d431e977274
> -A POSTROUTING -s 10.88.0.72/32 -m comment --comment "name: \"podman\" id: \"72222b2a1ef3da3e7195d7e12d5fa30581961f1d92758906ede4503dced4f61f\"" -j CNI-170407091e6f9d431e977274
> -A CNI-170407091e6f9d431e977274 -d 10.88.0.0/16 -m comment --comment "name: \"podman\" id: \"72222b2a1ef3da3e7195d7e12d5fa30581961f1d92758906ede4503dced4f61f\"" -j ACCEPT
> -A CNI-170407091e6f9d431e977274 ! -d 224.0.0.0/4 -m comment --comment "name: \"podman\" id: \"72222b2a1ef3da3e7195d7e12d5fa30581961f1d92758906ede4503dced4f61f\"" -j MASQUERADE
# systemctl stop nattest-nokill
# diff softshell.iptables <(iptables -t nat -S) | grep '^[<>]'
# journalctl -u nattest-nokill
Sep 01 07:09:18 systemd[1]: /etc/systemd/system/nattest-nokill.service:7: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
Sep 01 07:09:18 systemd[1]: Started Podman NAT Test.
Sep 01 07:09:18 podman[50733]: 2021-09-01 07:09:18.921899269 +0200 CEST m=+0.109981145 image pull busybox
Sep 01 07:09:19 podman[50733]: 2021-09-01 07:09:19.169311311 +0200 CEST m=+0.357393275 container create 72222b2a1ef3da3e7195d7e12d5fa30581961f1d92758906ede4503dced4f61f (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 07:09:19 podman[50733]: 2021-09-01 07:09:19.690867177 +0200 CEST m=+0.878949045 container init 72222b2a1ef3da3e7195d7e12d5fa30581961f1d92758906ede4503dced4f61f (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 07:09:19 podman[50733]: 2021-09-01 07:09:19.740818589 +0200 CEST m=+0.928900457 container start 72222b2a1ef3da3e7195d7e12d5fa30581961f1d92758906ede4503dced4f61f (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 07:09:19 podman[50733]: 2021-09-01 07:09:19.741177664 +0200 CEST m=+0.929259528 container attach 72222b2a1ef3da3e7195d7e12d5fa30581961f1d92758906ede4503dced4f61f (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 07:09:29 systemd[1]: Stopping Podman NAT Test...
Sep 01 07:09:29 podman[50733]: punt!
Sep 01 07:09:39 podman[50851]: 2021-09-01 07:09:39.754371376 +0200 CEST m=+10.316274069 container stop 72222b2a1ef3da3e7195d7e12d5fa30581961f1d92758906ede4503dced4f61f (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 07:09:39 podman[50851]: 2021-09-01 07:09:39.828801293 +0200 CEST m=+10.390704033 container died 72222b2a1ef3da3e7195d7e12d5fa30581961f1d92758906ede4503dced4f61f (image=docker.io/library/busybox:latest, name=nattest)
Sep 01 07:09:39 podman[50851]: nattest
Sep 01 07:09:39 systemd[1]: nattest-nokill.service: Deactivated successfully.
Sep 01 07:09:39 systemd[1]: nattest-nokill.service: Unit process 50733 (podman) remains running after unit stopped.
Sep 01 07:09:39 systemd[1]: nattest-nokill.service: Unit process 50839 (conmon) remains running after unit stopped.
Sep 01 07:09:39 systemd[1]: nattest-nokill.service: Unit process 50907 (podman) remains running after unit stopped.
Sep 01 07:09:39 systemd[1]: nattest-nokill.service: Unit process 50919 (firewall) remains running after unit stopped.
Sep 01 07:09:39 systemd[1]: nattest-nokill.service: Unit process 50924 (iptables) remains running after unit stopped.
Sep 01 07:09:39 systemd[1]: nattest-nokill.service: Unit process 50925 (bridge) remains running after unit stopped.
Sep 01 07:09:39 systemd[1]: Stopped Podman NAT Test.
Sep 01 07:09:40 podman[50733]: 2021-09-01 07:09:40.502850998 +0200 CEST m=+21.690932941 container remove 72222b2a1ef3da3e7195d7e12d5fa30581961f1d92758906ede4503dced4f61f (image=docker.io/library/busybox:latest, name=nattest)
Here systemd itself warns about remaining processes and the message about container removal appears only after the service has been stopped.
Output of podman version:
Version: 3.3.0
API Version: 3.3.0
Go Version: go1.16.6
Built: Fri Aug 20 21:36:14 2021
OS/Arch: linux/amd64
Output of podman info --debug:
host:
arch: amd64
buildahVersion: 1.22.3
cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.0.29-2.fc34.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.29, commit: '
cpus: 1
distribution:
distribution: fedora
version: "34"
eventLogger: journald
hostname: <redacted>
idMappings:
gidmap: null
uidmap: null
kernel: 5.13.12-200.fc34.x86_64
linkmode: dynamic
memFree: 490217472
memTotal: 2061930496
ociRuntime:
name: crun
package: crun-0.21-1.fc34.x86_64
path: /usr/bin/crun
version: |-
crun version 0.21
commit: c4c3cdf2ce408ed44a9e027c618473e6485c635b
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
remoteSocket:
path: /run/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.1.12-2.fc34.x86_64
version: |-
slirp4netns version 1.1.12
commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.0
swapFree: 0
swapTotal: 0
uptime: 32h 51m 47.87s (Approximately 1.33 days)
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- docker.io
- quay.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 2
paused: 0
running: 2
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /var/lib/containers/storage
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "true"
imageStore:
number: 28
runRoot: /run/containers/storage
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 3.3.0
Built: 1629488174
BuiltTime: Fri Aug 20 21:36:14 2021
GitCommit: ""
GoVersion: go1.16.6
OsArch: linux/amd64
Version: 3.3.0
Package info (e.g. output of rpm -q podman or apt list podman):
# rpm -q podman{,-plugins} container{s-common,networking-plugins} crun
podman-3.3.0-1.fc34.x86_64
podman-plugins-3.3.0-1.fc34.x86_64
containers-common-1-21.fc34.noarch
containernetworking-plugins-1.0.0-1.fc34.x86_64
crun-0.21-1.fc34.x86_64
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)
Yes. Latest Fedora release with latest package from its repository.
Additional environment details (AWS, VirtualBox, physical, etc.):
Virtual machine on VMware ESXi. OS image from https://download.fedoraproject.org/pub/fedora/linux/releases/34/Cloud/x86_64/images/Fedora-Cloud-Base-34-1.2.x86_64.raw.xz .
# cat /etc/os-release
NAME=Fedora
VERSION="34 (Cloud Edition)"
ID=fedora
VERSION_ID=34
VERSION_CODENAME=""
PLATFORM_ID="platform:f34"
PRETTY_NAME="Fedora 34 (Cloud Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:34"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f34/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=34
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=34
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Cloud Edition"
VARIANT_ID=cloud
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (11 by maintainers)
Commits related to this issue
- podman stop always cleanup When a container is configured for auto removal podman stop should still do cleanup, there is no guarantee the the cleanup process spawned by conmon will be successful. Als... — committed to Luap99/libpod by Luap99 3 years ago
- podman stop always cleanup When a container is configured for auto removal podman stop should still do cleanup, there is no guarantee the the cleanup process spawned by conmon will be successful. Als... — committed to Luap99/libpod by Luap99 3 years ago
- podman stop always cleanup When a container is configured for auto removal podman stop should still do cleanup, there is no guarantee the the cleanup process spawned by conmon will be successful. Als... — committed to Luap99/libpod by Luap99 3 years ago
- podman stop always cleanup When a container is configured for auto removal podman stop should still do cleanup, there is no guarantee the the cleanup process spawned by conmon will be successful. Als... — committed to Luap99/libpod by Luap99 3 years ago
Why did we decide that
podman stopshouldn’t wait for removal? This sounds like a sane thing to do to prevent race conditions.