cri-o: "Killing unwanted pod" messages never stop

Description

I’m using k8s 1.9.6 and crio 1.9.10 on openstack. After starting and stopping many pods(for example when running heptio sonobuoy scanner), sometimes I see a flood of messages in kubelet logs in form Killing unwanted pod "some-pod". These messages repeat every two seconds. They appear only on some nodes. The pods mentioned in the message are no longer known to k8s api (kubectl get pod -a does not show them). The messages disappear when I restart crio. The to-be-killed pods can be seen in the output of crictl sandboxes and crioctl pod list.

Steps to reproduce the issue:

Get reproducer.yaml or

cat > reproducer.yml <<EOF
---
kind: Namespace
apiVersion: v1
metadata:
  name: reproducer

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: reproducer
  namespace: reproducer
spec:
  replicas: 100
  revisionHistoryLimit: 0
  template:
    metadata:
      labels:
        app: reproducer
    spec:
      volumes:
        - name: reproducer
          emptyDir: {}
      containers:
      - name: reproducer
        image: busybox
        args:
          - sleep
          - "3600"
        volumeMounts:
          - name: reproducer
            mountPath: /reproducer
EOF

Run kubectl apply -f reproducer.yaml ; sleep 20; kubectl -n reproducer delete pod --all; sleep 20; kubectl delete ns reproducer to create 100 pods, delete them all so they get recreated, and delete the namespace.
Wait about ten minutes for namespace to get deleted and for cluster to settle.

Describe the results you received: Messages like Killing unwanted pod "reproducer-5f6c5486f8-5xcr2" are repeatedly appearing in kubelet logs on some worker nodes.

Describe the results you expected: Messages regarding deleted namespace/pods cease to appear in logs.

Additional information you deem important (e.g. issue happens only occasionally): I have five worker nodes. Three of them exposed that problem after running the reproducer. Nodes that expose the problem are different from run to tun.

Problem is mitigated by restarting crio.

I’m also not sure, whether emptyDir volumes have anything to do with this.

Output of crio --version:

crio version 1.9.10
commit: "87237324485137e0f439c3011999401186b62874"

Additional environment details (AWS, VirtualBox, physical, etc.): Running on openstack, deployed with homegrown ansible scripts.

root@hardway-worker-1:~# systemctl cat crio
# /etc/systemd/system/crio.service
[Unit]
Description=CRI-O daemon
Documentation=https://github.com/kubernetes-incubator/cri-o

[Service]
ExecStart=/usr/local/bin/crio
Restart=always
RestartSec=10s

[Install]
WantedBy=multi-user.target

root@hardway-worker-1:~# cat /etc/crio/crio.conf | grep -v '^#' | grep -v '^$'
[crio]
root = "/var/lib/containers/storage"
runroot = "/var/run/containers/storage"
storage_driver = "overlay2"
storage_option = [
]
[crio.api]
listen = "/var/run/crio.sock"
stream_address = ""
stream_port = "10010"
file_locking = true
[crio.runtime]
runtime = "/usr/local/bin/runc"
runtime_untrusted_workload = ""
default_workload_trust = "trusted"
no_pivot = false
conmon = "/usr/local/libexec/crio/conmon"
conmon_env = [
	"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
]
selinux = false
seccomp_profile = "/etc/crio/seccomp.json"
apparmor_profile = "crio-default"
cgroup_manager = "cgroupfs"
hooks_dir_path = "/usr/share/containers/oci/hooks.d"
default_mounts = [
]
pids_limit = 1024
enable_shared_pid_namespace = false
log_size_max = -1
[crio.image]
default_transport = "docker://"
pause_image = "kubernetes/pause"
pause_command = "/pause"
signature_policy = ""
image_volumes = "mkdir"
insecure_registries = [
	"docker-registry.company.lan",
]
registries = [
	"docker.io",
]
[crio.network]
network_dir = "/etc/cni/net.d/"
plugin_dir = "/opt/cni/bin/"

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 15 (5 by maintainers)

Most upvoted comments

Oops, I’ve been testing this wrong, on an old (docker) cluster. Sorry! I still can reproduce the issue with crio v1.9.11:

Apr 16 12:12:05 hardway-worker-1 kubelet[31055]: I0416 12:12:05.410464   31055 kubelet.go:1851] SyncLoop (ADD, "api"): "reproducer-7b4fcc479d-2ghfl_reproducer(6084349d-416f-11e8-9969-fa163e583850)"
Apr 16 12:12:05 hardway-worker-1 kubelet[31055]: I0416 12:12:05.566758   31055 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-qrc5l" (UniqueName: "kubernetes.io/secret/6084349d-416f-11e8-9969-fa163e583850-default-token-qrc5l") pod "reproducer-7b4fcc479d-2ghfl" (UID: "6084349d-416f-11e8-9969-fa163e583850")
Apr 16 12:12:05 hardway-worker-1 kubelet[31055]: I0416 12:12:05.723053   31055 reconciler.go:262] operationExecutor.MountVolume started for volume "default-token-qrc5l" (UniqueName: "kubernetes.io/secret/6084349d-416f-11e8-9969-fa163e583850-default-token-qrc5l") pod "reproducer-7b4fcc479d-2ghfl" (UID: "6084349d-416f-11e8-9969-fa163e583850")
Apr 16 12:12:06 hardway-worker-1 kubelet[31055]: I0416 12:12:06.273890   31055 operation_generator.go:523] MountVolume.SetUp succeeded for volume "default-token-qrc5l" (UniqueName: "kubernetes.io/secret/6084349d-416f-11e8-9969-fa163e583850-default-token-qrc5l") pod "reproducer-7b4fcc479d-2ghfl" (UID: "6084349d-416f-11e8-9969-fa163e583850")
Apr 16 12:12:06 hardway-worker-1 kubelet[31055]: I0416 12:12:06.641156   31055 kuberuntime_manager.go:385] No sandbox for pod "reproducer-7b4fcc479d-2ghfl_reproducer(6084349d-416f-11e8-9969-fa163e583850)" can be found. Need to start a new one
Apr 16 12:12:07 hardway-worker-1 crio[30864]: time="2018-04-16 12:12:07.174076130Z" level=info msg="Got pod network {Name:reproducer-7b4fcc479d-2ghfl Namespace:reproducer ID:faf25d8e4dcd6f2b3c6f5f3e92faa03f882761d03c60d4ed2817c2032f66f913 NetNS:/proc/32332/ns/net PortMappings:[]}"
Apr 16 12:12:07 hardway-worker-1 crio[30864]: time="2018-04-16 12:12:07.234627129Z" level=info msg="Got pod network {Name:reproducer-7b4fcc479d-2ghfl Namespace:reproducer ID:faf25d8e4dcd6f2b3c6f5f3e92faa03f882761d03c60d4ed2817c2032f66f913 NetNS:/proc/32332/ns/net PortMappings:[]}"
Apr 16 12:12:08 hardway-worker-1 kubelet[31055]: I0416 12:12:08.248978   31055 kubelet.go:1896] SyncLoop (PLEG): "reproducer-7b4fcc479d-2ghfl_reproducer(6084349d-416f-11e8-9969-fa163e583850)", event: &pleg.PodLifecycleEvent{ID:"6084349d-416f-11e8-9969-fa163e583850", Type:"ContainerStarted", Data:"faf25d8e4dcd6f2b3c6f5f3e92faa03f882761d03c60d4ed2817c2032f66f913"}
Apr 16 12:12:23 hardway-worker-1 kubelet[31055]: I0416 12:12:23.932729   31055 kubelet.go:1867] SyncLoop (DELETE, "api"): "reproducer-7b4fcc479d-2ghfl_reproducer(6084349d-416f-11e8-9969-fa163e583850)"
Apr 16 12:12:40 hardway-worker-1 kubelet[31055]: E0416 12:12:40.458432   31055 pod_workers.go:186] Error syncing pod 6084349d-416f-11e8-9969-fa163e583850 ("reproducer-7b4fcc479d-2ghfl_reproducer(6084349d-416f-11e8-9969-fa163e583850)"), skipping: failed to "StartContainer" for "reproducer" with CreateContainerConfigError: "cannot find volume \"default-token-qrc5l\" to mount into container \"reproducer\""
Apr 16 12:12:54 hardway-worker-1 kubelet[31055]: I0416 12:12:54.405310   31055 kubelet_pods.go:1118] Killing unwanted pod "reproducer-7b4fcc479d-2ghfl"
Apr 16 12:12:54 hardway-worker-1 crio[30864]: time="2018-04-16 12:12:54.410397564Z" level=info msg="Got pod network {Name:reproducer-7b4fcc479d-2ghfl Namespace:reproducer ID:faf25d8e4dcd6f2b3c6f5f3e92faa03f882761d03c60d4ed2817c2032f66f913 NetNS:/proc/32332/ns/net PortMappings:[]}"
Apr 16 12:12:54 hardway-worker-1 crio[30864]: time="2018-04-16 12:12:54.507387356Z" level=warning msg="failed to destroy network for pod sandbox k8s_reproducer-7b4fcc479d-2ghfl_reproducer_6084349d-416f-11e8-9969-fa163e583850_0(faf25d8e4dcd6f2b3c6f5f3e92faa03f882761d03c60d4ed2817c2032f66f913): failed to Statfs "/proc/32332/ns/net": no such file or directory"
Apr 16 12:13:24 hardway-worker-1 kubelet[31055]: I0416 12:13:24.581243   31055 kubelet_pods.go:1118] Killing unwanted pod "reproducer-7b4fcc479d-2ghfl"
Apr 16 12:13:24 hardway-worker-1 kubelet[31055]: I0416 12:13:24.626440   31055 kubelet_pods.go:1118] Killing unwanted pod "reproducer-7b4fcc479d-2ghfl"
Apr 16 12:13:55 hardway-worker-1 kubelet[31055]: I0416 12:13:55.065211   31055 kubelet_pods.go:1118] Killing unwanted pod "reproducer-7b4fcc479d-2ghfl"
Apr 16 12:13:55 hardway-worker-1 kubelet[31055]: I0416 12:13:55.131353   31055 kubelet.go:1867] SyncLoop (DELETE, "api"): "reproducer-7b4fcc479d-2ghfl_reproducer(6084349d-416f-11e8-9969-fa163e583850)"
Apr 16 12:13:55 hardway-worker-1 kubelet[31055]: I0416 12:13:55.132076   31055 kubelet.go:1861] SyncLoop (REMOVE, "api"): "reproducer-7b4fcc479d-2ghfl_reproducer(6084349d-416f-11e8-9969-fa163e583850)"
Apr 16 12:13:55 hardway-worker-1 kubelet[31055]: I0416 12:13:55.133335   31055 kubelet_pods.go:1118] Killing unwanted pod "reproducer-7b4fcc479d-2ghfl"
Apr 16 12:13:56 hardway-worker-1 kubelet[31055]: I0416 12:13:56.726713   31055 kubelet_pods.go:1118] Killing unwanted pod "reproducer-7b4fcc479d-2ghfl"
Apr 16 12:14:27 hardway-worker-1 kubelet[31055]: I0416 12:14:27.089949   31055 kubelet_pods.go:1118] Killing unwanted pod "reproducer-7b4fcc479d-2ghfl"
Apr 16 12:14:27 hardway-worker-1 kubelet[31055]: I0416 12:14:27.435679   31055 kubelet_pods.go:1118] Killing unwanted pod "reproducer-7b4fcc479d-2ghfl"

fungusakafungus on Apr 16, 2018