cri-o: "Killing unwanted pod" messages never stop
Description
I’m using k8s 1.9.6 and crio 1.9.10 on openstack.
After starting and stopping many pods(for example when running heptio sonobuoy scanner), sometimes I see a flood of messages in kubelet logs in form Killing unwanted pod "some-pod". These messages repeat every two seconds. They appear only on some nodes. The pods mentioned in the message are no longer known to k8s api (kubectl get pod -a does not show them). The messages disappear when I restart crio. The to-be-killed pods can be seen in the output of crictl sandboxes and crioctl pod list.
Steps to reproduce the issue:
- Get reproducer.yaml or
cat > reproducer.yml <<EOF
---
kind: Namespace
apiVersion: v1
metadata:
name: reproducer
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: reproducer
namespace: reproducer
spec:
replicas: 100
revisionHistoryLimit: 0
template:
metadata:
labels:
app: reproducer
spec:
volumes:
- name: reproducer
emptyDir: {}
containers:
- name: reproducer
image: busybox
args:
- sleep
- "3600"
volumeMounts:
- name: reproducer
mountPath: /reproducer
EOF
- Run
kubectl apply -f reproducer.yaml ; sleep 20; kubectl -n reproducer delete pod --all; sleep 20; kubectl delete ns reproducerto create 100 pods, delete them all so they get recreated, and delete the namespace. - Wait about ten minutes for namespace to get deleted and for cluster to settle.
Describe the results you received:
Messages like Killing unwanted pod "reproducer-5f6c5486f8-5xcr2" are repeatedly appearing in kubelet logs on some worker nodes.
Describe the results you expected: Messages regarding deleted namespace/pods cease to appear in logs.
Additional information you deem important (e.g. issue happens only occasionally): I have five worker nodes. Three of them exposed that problem after running the reproducer. Nodes that expose the problem are different from run to tun.
Problem is mitigated by restarting crio.
I’m also not sure, whether emptyDir volumes have anything to do with this.
Output of crio --version:
crio version 1.9.10
commit: "87237324485137e0f439c3011999401186b62874"
Additional environment details (AWS, VirtualBox, physical, etc.): Running on openstack, deployed with homegrown ansible scripts.
root@hardway-worker-1:~# systemctl cat crio
# /etc/systemd/system/crio.service
[Unit]
Description=CRI-O daemon
Documentation=https://github.com/kubernetes-incubator/cri-o
[Service]
ExecStart=/usr/local/bin/crio
Restart=always
RestartSec=10s
[Install]
WantedBy=multi-user.target
root@hardway-worker-1:~# cat /etc/crio/crio.conf | grep -v '^#' | grep -v '^$'
[crio]
root = "/var/lib/containers/storage"
runroot = "/var/run/containers/storage"
storage_driver = "overlay2"
storage_option = [
]
[crio.api]
listen = "/var/run/crio.sock"
stream_address = ""
stream_port = "10010"
file_locking = true
[crio.runtime]
runtime = "/usr/local/bin/runc"
runtime_untrusted_workload = ""
default_workload_trust = "trusted"
no_pivot = false
conmon = "/usr/local/libexec/crio/conmon"
conmon_env = [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
]
selinux = false
seccomp_profile = "/etc/crio/seccomp.json"
apparmor_profile = "crio-default"
cgroup_manager = "cgroupfs"
hooks_dir_path = "/usr/share/containers/oci/hooks.d"
default_mounts = [
]
pids_limit = 1024
enable_shared_pid_namespace = false
log_size_max = -1
[crio.image]
default_transport = "docker://"
pause_image = "kubernetes/pause"
pause_command = "/pause"
signature_policy = ""
image_volumes = "mkdir"
insecure_registries = [
"docker-registry.company.lan",
]
registries = [
"docker.io",
]
[crio.network]
network_dir = "/etc/cni/net.d/"
plugin_dir = "/opt/cni/bin/"
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (5 by maintainers)
Oops, I’ve been testing this wrong, on an old (docker) cluster. Sorry! I still can reproduce the issue with crio v1.9.11: