containerd: pod deletion failing on network namespace

Description

Deleted a pod. stays terminating: Sep 19 23:26:23 xx.yyyy containerd[245702]: time=“2019-09-19T23:26:23.112627249Z” level=error msg=“PodSandboxStatus for “004c4f886765769305f7e65a42c66ec95fda8fba9ddbf0d62fedae62e8873299” failed” error=“failed to get sandbox ip: check network namespace closed: remove netns: unlinkat /var/run/netns/cni-45ff10e9-dcc1-b779-f1a1-3515a5d56e61: device or resource busy”

Steps to reproduce the issue: unsure.

Output of containerd --version:

containerd github.com/containerd/containerd v1.2.9 d50db0a42053864a270f648048f9a8b4f24eced3

Any other relevant information:

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 6
Comments: 33 (23 by maintainers)

Commits related to this issue

packages: Handle fs.may_detach_mounts sysctl for containerd On EL7.4, a new sysctl, `fs.may_detach_mounts`, was added which should be enabled on hosts where container runtimes are being used (it's of... — committed to scality/metalk8s by NicolasT 3 years ago
fix: https://github.com/containerd/containerd/issues/3667; support disabel logrotate; set dnsPolicy for scheduler. Signed-off-by: huaiyou <huaiyou.cyz@alibaba-inc.com> — committed to AliyunContainerService/ackdistro by VinceCui a year ago

Most upvoted comments

The solution for me was indicated here:

https://github.com/cri-o/cri-o/pull/4210

echo 1 > /proc/sys/fs/may_detach_mounts

Good luck.

+20

mnaser on Mar 18, 2021

Thinking more about it, maybe this /run/netns/cni-X proc mount could be “private” so it never appears in the container mount namespace in the first place (but not sure it’s possible). But if we start to hide some mount point it’s hard to say where we stop.

Wait for @fuweid / containerd maintainers opinions to close this issue

champtar on Sep 24, 2019

check your iproute rpm package. maybe related to https://patchwork.ozlabs.org/patch/796300/

KubeStacker on May 9, 2020

@fuweid I sill have same issue with patched iproute… What information should I provide for debugging?

STASiAN on Jul 14, 2020

adding mountPropagation: HostToContainer is enough to fix the issue @kfox1111 you might want to use mountPropagation: Bidirectional

champtar on Sep 24, 2019

if there is no process using the file, since the containerd check the ns and the ns file doesn’t contain the magic number, this issue can help? @champtar

ref: containernetworking/plugins#69

I’m not sure how much we can trust lsof with namespaces Will have a look at your link on Monday (it’s 11pm Sunday for me)

champtar on Sep 23, 2019

I think I have the same issue, will try to update my containerd, but here my investigation on 1.2.6

Description Sometimes pod are stuck in Terminating, in the logs I have in loop

sept. 21 00:57:44 etienne-ks141 kubelet[1408]: E0921 00:57:44.921844    1408 kuberuntime_manager.go:887] PodSandboxStatus of sandbox "6dffd308312e2e293b89f19d5fc6c3c8fdfe74eb3e6e7bdc328f4879e42e2a0c" for pod "rook-ceph-mon-a-59cbf85446-jwprv_rook-ceph(f774a9f2-5e4b-445b-b39a-0231456f37fd)" error: rpc error: code = Unknown desc = failed to get sandbox ip: check network namespace closed: remove netns: remove /var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4: device or resource busy
sept. 21 00:57:44 etienne-ks141 containerd[929]: time="2019-09-21T00:57:44.921651140Z" level=error msg="PodSandboxStatus for "6dffd308312e2e293b89f19d5fc6c3c8fdfe74eb3e6e7bdc328f4879e42e2a0c" failed" error="failed to get sandbox ip: check network namespace closed: remove netns: remove /var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4: device or resource busy"
sept. 21 00:57:44 etienne-ks141 containerd[929]: time="2019-09-21T00:57:44.922440028Z" level=error msg="PodSandboxStatus for "6dffd308312e2e293b89f19d5fc6c3c8fdfe74eb3e6e7bdc328f4879e42e2a0c" failed" error="failed to get sandbox ip: check network namespace closed: remove netns: remove /var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4: device or resource busy"
sept. 21 00:57:44 etienne-ks141 kubelet[1408]: E0921 00:57:44.922551    1408 kuberuntime_manager.go:887] PodSandboxStatus of sandbox "6dffd308312e2e293b89f19d5fc6c3c8fdfe74eb3e6e7bdc328f4879e42e2a0c" for pod "rook-ceph-mon-a-59cbf85446-jwprv_rook-ceph(f774a9f2-5e4b-445b-b39a-0231456f37fd)" error: rpc error: code = Unknown desc = failed to get sandbox ip: check network namespace closed: remove netns: remove /var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4: device or resource busy

We cannot shutdown “rook-ceph-mon-a-59cbf85446-jwprv” because “/var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4: device or resource busy”

looking at strace output

# strace -f -s10000 -p 929 2>&1 | grep -B 20 'cni-3276e31b-c2af-4840-abce-a9d3e8d061b4'

[pid  1525] statfs("/var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4", {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=472741, f_bfree=428901, f_bavail=428901, f_files=472741, f_ffree=471867, f_fsid={0, 0}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NODEV}) = 0
[pid  1525] unlinkat(AT_FDCWD, "/var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4", 0) = -1 EBUSY (Device or resource busy)
[pid  1525] unlinkat(AT_FDCWD, "/var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4", AT_REMOVEDIR) = -1 ENOTDIR (Not a directory)
[pid  1525] lstat("/var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  1525] write(2, "time=\"2019-09-21T01:09:57.096231612Z\" level=error msg=\"PodSandboxStatus for \"6dffd308312e2e293b89f19d5fc6c3c8fdfe74eb3e6e7bdc328f4879e42e2a0c\" failed\" error=\"failed to get sandbox ip: check network namespace closed: remove netns: remove /var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4: device or resource busy\" \n", 320 <unfinished ...>

It fails when we try to unlink “/var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4” it

This is a simple empty file

# ls -lia /var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4
37906 -rw-r--r--. 1 root root 0 18 sept. 22:50 /var/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4

So the likely explanation for “device or resource busy” error is that there is something mounted on top

# grep cni-3276e31b-c2af-4840-abce-a9d3e8d061b4 /proc/mounts
# grep cni-3276e31b-c2af-4840-abce-a9d3e8d061b4 /proc/*/mounts
/proc/3091/mounts:proc /rootfs/run/netns/cni-3276e31b-c2af-4840-abce-a9d3e8d061b4 proc rw,nosuid,nodev,noexec,relatime 0 0

process 3091 is the one using this file it seems

# cat /proc/3091/cmdline 
/usr/local/bin/cephcsi--nodeid=etienne-ks141--endpoint=unix:///csi/csi.sock--v=5--type=rbd--nodeserver=true--drivername=rook-ceph.rbd.csi.ceph.com--containerized=true--pidlimit=-1--metricsport=9090--metricspath=/metrics--enablegrpcmetrics=true

# pstree -pla | grep -C20 3091
...
  |   |-containerd-shim,3062 -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/1100e90035d5b438c407aa1ab2bfcb3f0a89c2b4805f20260f81e73b936013f0 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
  |   |   |-cephcsi,3091 --nodeid=etienne-ks141 --endpoint=unix:///csi/csi.sock --v=5 --type=rbd --nodeserver=true --drivername=rook-ceph.rbd.csi.ceph.com --containerized=true --pidlimit=-1 --metricsport=9090 --metricspath=/metrics --enablegrpcmetrics=true
  |   |   |   |-{cephcsi},3131
  |   |   |   |-{cephcsi},3132
  |   |   |   |-{cephcsi},3133
  |   |   |   |-{cephcsi},3134
  |   |   |   |-{cephcsi},3205
  |   |   |   `-{cephcsi},32605
  |   |   |-{containerd-shim},3064
  |   |   |-{containerd-shim},3065
  |   |   |-{containerd-shim},3066
  |   |   |-{containerd-shim},3068
  |   |   |-{containerd-shim},3069
  |   |   |-{containerd-shim},3070
  |   |   |-{containerd-shim},3071
  |   |   |-{containerd-shim},3073
  |   |   `-{containerd-shim},11858
...

# cat /proc/3091/cgroup | head -n1
11:cpuacct,cpu:/kubepods/besteffort/podcc17615e-1c3b-4155-a2f2-658090f682ca/1100e90035d5b438c407aa1ab2bfcb3f0a89c2b4805f20260f81e73b936013f0

so the pod that is blocking us is cc17615e-1c3b-4155-a2f2-658090f682ca

# kubectl get --all-namespaces pod -o yaml | grep -B 1 cc17615e-1c3b-4155-a2f2-658090f682ca
    selfLink: /api/v1/namespaces/rook-ceph/pods/csi-rbdplugin-b5z8n
    uid: cc17615e-1c3b-4155-a2f2-658090f682ca

or rook-ceph/csi-rbdplugin-b5z8n

now if we go back a bit, the pod that fail to shutdown is rook-ceph/rook-ceph-mon-a-59cbf85446-jwprv if we search with the sandbox id

# pstree -pla | grep -A20 6dffd308312e2e293b89f19d5fc6c3c8fdfe74eb3e6e7bdc328f4879e42e2a0c
  |   |-containerd-shim,4689 -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/6dffd308312e2e293b89f19d5fc6c3c8fdfe74eb3e6e7bdc328f4879e42e2a0c -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
  |   |   |-pause,4707
  |   |   |-{containerd-shim},4691
  |   |   |-{containerd-shim},4692
  |   |   |-{containerd-shim},4693
  |   |   |-{containerd-shim},4694
  |   |   |-{containerd-shim},4695
  |   |   |-{containerd-shim},4696
  |   |   |-{containerd-shim},4739
  |   |   |-{containerd-shim},4740
  |   |   `-{containerd-shim},13814

we see that the process is not running anymore.

kubectl get -n rook-ceph pod/csi-rbdplugin-b5z8n -o yaml > rook-ceph_csi-rbdplugin-b5z8n.txt kubectl get -n rook-ceph pod/rook-ceph-mon-a-59cbf85446-jwprv -o yaml > rook-ceph_rook-ceph-mon-a-59cbf85446-jwprv.txt

csi-rbdplugin-b5z8n has “hostPID: true”, but not rook-ceph-mon-a-59cbf85446-jwprv

If I look at another server with a stuck container

sept. 21 03:30:35 etienne-ks143 containerd[921]: time="2019-09-21T03:30:35.320893195Z" level=error msg="PodSandboxStatus for "e63fe4ab645c9a73c9de37c18a78eea5161842e51ec5c69502d3decc67b640c1" failed" error="failed to get sandbox ip: check network namespace closed: remove netns: remove /var/run/netns/cni-f0dbf291-fcb0-51a0-c6b9-04f63fc1ce83: device or resource busy"
sept. 21 03:30:35 etienne-ks143 kubelet[1407]: E0921 03:30:35.322040    1407 kuberuntime_manager.go:887] PodSandboxStatus of sandbox "e63fe4ab645c9a73c9de37c18a78eea5161842e51ec5c69502d3decc67b640c1" for pod "csi-cephfsplugin-provisioner-69cf8dc7c4-zftdg_rook-ceph(9cc95f96-da0a-493b-92a0-eed33311be51)" error: rpc error: code = Unknown desc = failed to get sandbox ip: check network namespace closed: remove netns: remove /var/run/netns/cni-f0dbf291-fcb0-51a0-c6b9-04f63fc1ce83: device or resource busy

# grep f0dbf291-fcb0-51a0-c6b9-04f63fc1ce83 /proc/mounts
# grep f0dbf291-fcb0-51a0-c6b9-04f63fc1ce83 /proc/*/mounts
/proc/2930/mounts:proc /rootfs/run/netns/cni-f0dbf291-fcb0-51a0-c6b9-04f63fc1ce83 proc rw,nosuid,nodev,noexec,relatime 0 0
/proc/32578/mounts:proc /rootfs/run/netns/cni-f0dbf291-fcb0-51a0-c6b9-04f63fc1ce83 proc rw,nosuid,nodev,noexec,relatime 0 0

# cat /proc/2930/cgroup | head -n1
11:memory:/kubepods/besteffort/pod1aa6025e-4567-4c92-9581-417e303db72e/0a1122555b608c5c6ce5beea0f7f5a63a6840e8c24e0c7061bcb24985845a340
# kubectl get --all-namespaces pod -o yaml | grep -B 1 1aa6025e-4567-4c92-9581-417e303db72e
    selfLink: /api/v1/namespaces/rook-ceph/pods/csi-rbdplugin-p59dc
    uid: 1aa6025e-4567-4c92-9581-417e303db72e

# cat /proc/32578/cgroup | head -n1
11:memory:/kubepods/besteffort/pod6efa955b-f708-4456-b1a1-1cfe92b66ed7/8f80c4488cdaf2209dac68a6b882546db835a740a57a6c0620a7d02b3c364057
# kubectl get --all-namespaces pod -o yaml | grep -B 1 6efa955b-f708-4456-b1a1-1cfe92b66ed7
    selfLink: /api/v1/namespaces/rook-ceph/pods/csi-rbdplugin-provisioner-dccb5f67-66m9x
    uid: 6efa955b-f708-4456-b1a1-1cfe92b66ed7

# kubectl get -n rook-ceph pod/csi-rbdplugin-p59dc -o yaml | grep hostPID
  hostPID: true
# kubectl get -n rook-ceph pod/csi-rbdplugin-provisioner-dccb5f67-66m9x -o yaml | grep hostPID

I’ll try to reproduce with more minimal containers, but that means destroying everything

Steps to reproduce the issue: no idea, using latest Centos (7.7 / 3.10.0-1062.1.1.el7.x86_64), cluster deployed using kubespray 2.11 configured to use containerd instead of docker, all the rest default

Describe the results you expected: It works 😉

Output of containerd --version:

containerd containerd.io 1.2.6 894b81a4b802e4eb2a91d1ce216b8817763c29fb

champtar on Sep 21, 2019