kubernetes: kubectl delete daemonset hang
I have a daemonset resource like this:
NAME DESIRED CURRENT NODE-SELECTOR AGE
newrelic-agent 3 3 <none> 3d
When I try to run kubectl delete daemonset
to delete this daemonset resource , it hang a few minutes, and exited with result 1.
root@ubuntu192:~/k8s# kubectl get daemonset
NAME DESIRED CURRENT NODE-SELECTOR AGE
newrelic-agent 0 0 13e0252d-8069-11e6-9cdd-005056881537=13e02602-8069-11e6-9cdd-005056881537 3d
This daemonset still exists.I try to delete it again.This time it hang 8 minutes.
root@ubuntu192:~/k8s# kubectl delete ds newrelic-agent
error: timed out waiting for the condition
Kubernetes version (use kubectl version
):
Client Version: version.Info{Major:“1”, Minor:“4+”, GitVersion:“v1.4.0-alpha.2.282+c8ea7af912f86e”, GitCommit:“c8ea7af912f86e05e22f1e8d0d0b90c8b9fc90d7”, GitTreeState:“clean”, BuildDate:“2016-08-05T01:09:47Z”, GoVersion:“go1.6”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“4+”, GitVersion:“v1.4.0-alpha.2.282+c8ea7af912f86e”, GitCommit:“c8ea7af912f86e05e22f1e8d0d0b90c8b9fc90d7”, GitTreeState:“clean”, BuildDate:“2016-08-05T01:09:47Z”, GoVersion:“go1.6”, Compiler:“gc”, Platform:“linux/amd64”}
Environment: Linux ubuntu192.168.14.100 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
This is part of logs I strace kubectl delete daemonset
command. Is that some way k8s get stack strace like docker dameon?
stat(“/root/.kube/config”, {st_mode=S_IFREG|0644, st_size=332, …}) = 0 openat(AT_FDCWD, “/root/.kube/config”, O_RDONLY|O_CLOEXEC) = 5 fstat(5, {st_mode=S_IFREG|0644, st_size=332, …}) = 0 read(5, “apiVersion: v1\nclusters:\n- clust”…, 844) = 332 read(5, “”, 512) = 0 close(5) = 0 stat(“/var/run/secrets/kubernetes.io/serviceaccount/token”, 0xc820466518) = -1 ENOENT (No such file or directory) stat(“/var/run/secrets/kubernetes.io/serviceaccount/token”, 0xc8204665e8) = -1 ENOENT (No such file or directory) futex(0xc8200e0908, FUTEX_WAKE, 1) = 1 socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 5 setsockopt(5, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 connect(5, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr(“192.168.14.100”)}, 16) = -1 EINPROGRESS (Operation now in progress) epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1007984784, u64=140308999610512}}) = 0 getsockopt(5, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 getsockname(5, {sa_family=AF_INET, sin_port=htons(33804), sin_addr=inet_addr(“192.168.14.100”)}, [16]) = 0 getpeername(5, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr(“192.168.14.100”)}, [16]) = 0 setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(5, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 setsockopt(5, SOL_TCP, TCP_KEEPINTVL, [30], 4) = 0 setsockopt(5, SOL_TCP, TCP_KEEPIDLE, [30], 4) = 0 futex(0xc8200e0908, FUTEX_WAKE, 1) = 1 read(5, 0xc82029c000, 4096) = -1 EAGAIN (Resource temporarily unavailable) write(5, “GET /api HTTP/1.1\r\nHost: 192.168”…, 163) = 163 futex(0xc8200e0908, FUTEX_WAKE, 1) = 1 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = 0 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = 0 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x2965d08, FUTEX_WAIT, 0, NULL) = 0 epoll_wait(4, [], 128, 0) = 0 futex(0x2964fc0, FUTEX_WAKE, 1) = 0 futex(0x2964f10, FUTEX_WAKE, 1) = 1 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x2964fe8, FUTEX_WAKE, 1) = 0 futex(0x2964f10, FUTEX_WAKE, 1) = 1 futex(0xc8200e0908, FUTEX_WAKE, 1) = 1 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = 0 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = 0 sched_yield() = 0 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = 0 futex(0x2964fe8, FUTEX_WAKE, 1) = 0 futex(0x2964f10, FUTEX_WAKE, 1) = 1 futex(0x2964fc0, FUTEX_WAKE, 1) = 1 futex(0x2964f10, FUTEX_WAKE, 1) = 1 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = 0 sched_yield() = 0 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = 0 select(0, NULL, NULL, NULL, {0, 100}) = 0 (Timeout) futex(0x2965d08, FUTEX_WAIT, 0, NULL) = 0 sched_yield() = 0 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = 0 futex(0x2964fe8, FUTEX_WAKE, 1) = 1 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) sched_yield() = 0 sched_yield() = 0 futex(0x2965d08, FUTEX_WAIT, 0, NULL) = 0 futex(0x2965d08, FUTEX_WAIT, 0, NULLerror: timed out waiting for the condition <unfinished …> +++ exited with 1 +++
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 37 (22 by maintainers)
Seeing this problem with 1.6.1.
I created the calico-node ds, it was misconfigured, so I deleted it. All the pods were terminated, but the ds won’t go away. I’ve tried --now and --force, but kubectl won’t delete the ds. The node selector has been updated to randomUUID=randomUUID.
edit: Fixed it with
kubectl delete ds --force --now --cascade=false
.I hit this issue. In my case, when I deleted daemonset and it hang, some of pods were
Terminating
status. I’m not sure if all of reporters hit same reason with me, but here is a step to reproduce and workaround (in my case).1. Stop one of node service and makes it
NotReady
2. Deploy daemonset and confirm one pod doesn’t have
Running
status.3. Delete daemonset and observe the issue
4. Confirm the pod running on
NotReady
host stillTerminating
workaround
1. delete pod with
--grace-period=0
2. delete daemonset
@vnandha
kubectl delete
deletes resource cascadingly by default. DaemonSet cannot finish cascading deletion without all of its pods be gone (which is expected). You can resolve this by either forcefully deleting that pod and then deleting the DaemonSet cascadingly, or by deleting the DaemonSet with--cascade=false
, and then delete its pods manually.I’m unable to reproduce it, so just brainstorming some possible causes.
If DaemonSet deletion timeout and…
.spec.template.spec.nodeSelector
isn’t updated to randomUUID=randomUUID: it’s something to do withkubectl delete
Running
: it’s something to do with DaemonSet controllerTerminating
: it’s something to do withkubelet
.status.currentNumberScheduled + .status.numberMisscheduled
isn’t0
: it’s something to do with DaemonSet controller