kubernetes: docker kill hangs, pod stuck in terminating

I had a pod stuck in terminating. Similar to some other issues, didn’t check for exact dupe.

Logged into the node and debugged a little (container cde46198ade6):

beeps@e2e-test-beeps-minion-cfl3:~$ sudo docker kill cde46198ade6
...
Hung for like 5m
^C
beeps@e2e-test-beeps-minion-cfl3:~$ sudo docker ps
CONTAINER ID        IMAGE                                                                  COMMAND                  CREATED             STATUS              PORTS               NAMES
cde46198ade6        erkules/galera:basic                                                   "/entrypoint.sh --def"   7 minutes ago       Up 7 minutes                            k8s_mysql.47396615_mysql-2_e2e-tests-petset-hy8ki_e8e98ddf-172d-11e6-b810-42010af00002_36392735

beeps@e2e-test-beeps-minion-cfl3:~$ sudo docker kill cde46198ade6
Error response from daemon: Cannot kill container cde46198ade6: [2] Container does not exist: container destroyed
Error: failed to kill containers: [cde46198ade6]

beeps@e2e-test-beeps-minion-cfl3:~$ sudo docker ps
CONTAINER ID        IMAGE                                                                  COMMAND                  CREATED             STATUS              PORTS               NAMES
cde46198ade6        erkules/galera:basic                                                   "/entrypoint.sh --def"   7 minutes ago       Up 7 minutes   

Exec works after the kill failed

beeps@e2e-test-beeps-minion-cfl3:~$ sudo docker exec -it cde46198ade6 /bin/bash
root@mysql-2:/# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  1.5  0.0  18148  3096 ?        Ss   04:17   0:00 /bin/bash
root        15  0.0  0.0  15572  2176 ?        R+   04:17   0:00 ps aux

beeps@e2e-test-beeps-minion-cfl3:~$ sudo docker kill cde46198ade6
Error response from daemon: Cannot kill container cde46198ade6: [2] Container does not exist: container destroyed
Error: failed to kill containers: [cde46198ade6]

Inspect shows Running true:

beeps@e2e-test-beeps-minion-cfl3:~$ sudo docker inspect cde46198ade6
[
{
    "Id": "cde46198ade62617e84cc59987669d4e674e83475259680d1952fa60ff90565c",
    "Created": "2016-05-11T04:07:54.627038551Z",
    "Path": "/entrypoint.sh",
    "Args": [
        "--defaults-file=/etc/mysql/my-galera.cnf",
        "--user=root"
    ],
    "State": {
        "Status": "running",
        "Running": true,
        "Paused": false,
        "Restarting": false,
        "OOMKilled": false,
        "Dead": false,
        "Pid": 27514,
        "ExitCode": 0,
        "Error": "",
        "StartedAt": "2016-05-11T04:07:54.839755198Z",
        "FinishedAt": "0001-01-01T00:00:00Z"
    },
    "Image": "7108a4321e9900675ba193af33555d0354ab66fc72ff592ae2acd38191db488a",
    "ResolvConfPath": "/var/lib/docker/containers/e3ddae18879c2d6723dd960fecaf32633c726e283a747a5171822622b0ca5236/resolv.conf",
    "HostnamePath": "/var/lib/docker/containers/e3ddae18879c2d6723dd960fecaf32633c726e283a747a5171822622b0ca5236/hostname",
    "HostsPath": "/var/lib/kubelet/pods/e8e98ddf-172d-11e6-b810-42010af00002/etc-hosts",
    "LogPath": "/var/lib/docker/containers/cde46198ade62617e84cc59987669d4e674e83475259680d1952fa60ff90565c/cde46198ade62617e84cc59987669d4e674e83475259680d1952fa60ff90565c-json.log",
    "Name": "/k8s_mysql.47396615_mysql-2_e2e-tests-petset-hy8ki_e8e98ddf-172d-11e6-b810-42010af00002_36392735",
    "RestartCount": 0,
    "Driver": "aufs",
    "ExecDriver": "native-0.2",
    "MountLabel": "",
    "ProcessLabel": "",
    "AppArmorProfile": "",
    "ExecIDs": null,
    "HostConfig": {
        "Binds": [
            "/var/lib/kubelet/pods/e8e98ddf-172d-11e6-b810-42010af00002/volumes/kubernetes.io~gce-pd/pv-gce-qf9p5:/var/lib/",
            "/var/lib/kubelet/pods/e8e98ddf-172d-11e6-b810-42010af00002/volumes/kubernetes.io~empty-dir/config:/etc/mysql",
            "/var/lib/kubelet/pods/e8e98ddf-172d-11e6-b810-42010af00002/volumes/kubernetes.io~secret/default-token-ksbzc:/var/run/secrets/kubernetes.io/serviceaccount:ro",
            "/var/lib/kubelet/pods/e8e98ddf-172d-11e6-b810-42010af00002/etc-hosts:/etc/hosts",
            "/var/lib/kubelet/pods/e8e98ddf-172d-11e6-b810-42010af00002/containers/mysql/36392735:/dev/termination-log"
        ],
        "ContainerIDFile": "",
        "LxcConf": null,
        "Memory": 0,
        "MemoryReservation": 0,
        "MemorySwap": -1,
        "KernelMemory": 0,
        "CpuShares": 2,
        "CpuPeriod": 0,
        "CpusetCpus": "",
        "CpusetMems": "",
        "CpuQuota": 0,
        "BlkioWeight": 0,
        "OomKillDisable": false,
        "MemorySwappiness": null,
        "Privileged": false,
        "PortBindings": null,
        "Links": null,
        "PublishAllPorts": false,
        "Dns": null,
        "DnsOptions": null,
        "DnsSearch": null,
        "ExtraHosts": null,
        "VolumesFrom": null,
        "Devices": null,
        "NetworkMode": "container:e3ddae18879c2d6723dd960fecaf32633c726e283a747a5171822622b0ca5236",
        "IpcMode": "container:e3ddae18879c2d6723dd960fecaf32633c726e283a747a5171822622b0ca5236",
        "PidMode": "",
        "UTSMode": "",
        "CapAdd": null,
        "CapDrop": null,
        "GroupAdd": null,
        "RestartPolicy": {
            "Name": "",
            "MaximumRetryCount": 0
        },
        "SecurityOpt": null,
        "ReadonlyRootfs": false,
        "Ulimits": null,
        "LogConfig": {
            "Type": "json-file",
            "Config": {}
        },
        "CgroupParent": "/",
        "ConsoleSize": [
            0,
            0
        ],
        "VolumeDriver": ""
    },
    "GraphDriver": {
        "Name": "aufs",
        "Data": null
    },
    "Mounts": [
        {
            "Source": "/var/lib/kubelet/pods/e8e98ddf-172d-11e6-b810-42010af00002/volumes/kubernetes.io~secret/default-token-ksbzc",
            "Destination": "/var/run/secrets/kubernetes.io/serviceaccount",
            "Mode": "ro",
            "RW": false
        },
        {
            "Source": "/var/lib/kubelet/pods/e8e98ddf-172d-11e6-b810-42010af00002/etc-hosts",
            "Destination": "/etc/hosts",
            "Mode": "",
            "RW": true
        },
        {
            "Source": "/var/lib/kubelet/pods/e8e98ddf-172d-11e6-b810-42010af00002/containers/mysql/36392735",
            "Destination": "/dev/termination-log",
            "Mode": "",
            "RW": true
        },
        {
            "Source": "/var/lib/kubelet/pods/e8e98ddf-172d-11e6-b810-42010af00002/volumes/kubernetes.io~gce-pd/pv-gce-qf9p5",
            "Destination": "/var/lib",
            "Mode": "",
            "RW": true
        },
        {
            "Source": "/var/lib/kubelet/pods/e8e98ddf-172d-11e6-b810-42010af00002/volumes/kubernetes.io~empty-dir/config",
            "Destination": "/etc/mysql",
            "Mode": "",
            "RW": true
        }
    ],
    "Config": {
        "Hostname": "mysql-2",
        "Domainname": "",
        "User": "",
        "AttachStdin": false,
        "AttachStdout": false,
        "AttachStderr": false,
        "ExposedPorts": {
            "3306/tcp": {}
        },
        "Tty": false,
        "OpenStdin": false,
        "StdinOnce": false,
        "Env": [
            "KUBERNETES_PORT_443_TCP_PROTO=tcp",
            "KUBERNETES_PORT_443_TCP_PORT=443",
            "KUBERNETES_PORT_443_TCP_ADDR=10.0.0.1",
            "KUBERNETES_SERVICE_HOST=10.0.0.1",
            "KUBERNETES_SERVICE_PORT=443",
            "KUBERNETES_SERVICE_PORT_HTTPS=443",
            "KUBERNETES_PORT=tcp://10.0.0.1:443",
            "KUBERNETES_PORT_443_TCP=tcp://10.0.0.1:443",
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "VERSION=20160303",
            "DEBIAN_FRONTEND=noninteractive"
        ],
        "Cmd": [
            "--defaults-file=/etc/mysql/my-galera.cnf",
            "--user=root"
        ],
        "Image": "erkules/galera:basic",
        "Volumes": null,
        "WorkingDir": "",
        "Entrypoint": [
            "/entrypoint.sh"
        ],
        "OnBuild": null,
        "Labels": {
            "io.kubernetes.container.hash": "47396615",
            "io.kubernetes.container.name": "mysql",
            "io.kubernetes.container.restartCount": "0",
            "io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
            "io.kubernetes.pod.name": "mysql-2",
            "io.kubernetes.pod.namespace": "e2e-tests-petset-hy8ki",
            "io.kubernetes.pod.terminationGracePeriod": "30",
            "io.kubernetes.pod.uid": "e8e98ddf-172d-11e6-b810-42010af00002"
        }
    },
    "NetworkSettings": {
        "Bridge": "",
        "SandboxID": "",
        "HairpinMode": false,
        "LinkLocalIPv6Address": "",
        "LinkLocalIPv6PrefixLen": 0,
        "Ports": null,
        "SandboxKey": "",
        "SecondaryIPAddresses": null,
        "SecondaryIPv6Addresses": null,
        "EndpointID": "",
        "Gateway": "",
        "GlobalIPv6Address": "",
        "GlobalIPv6PrefixLen": 0,
        "IPAddress": "",
        "IPPrefixLen": 0,
        "IPv6Gateway": "",
        "MacAddress": "",
        "Networks": null
    }
}
]

But the pid isn’t around:

beeps@e2e-test-beeps-minion-cfl3:~$ ps aux | grep 27514
beeps    31717  0.0  0.0   7852  1948 pts/1    S+   04:20   0:00 grep 27514

And the pod remains in terminating:

21:16:48-beeps~/goproj/src/k8s.io/kubernetes] (petset_e2e)$ kn get po
NAME      READY     STATUS        RESTARTS   AGE
mysql-2   0/1       Terminating   0          9m

This is on 1.9.1, maybe fixed?

beeps@e2e-test-beeps-minion-cfl3:~$ docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
beeps@e2e-test-beeps-minion-cfl3:~$ sudo docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64

@kubernetes/goog-node

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 8
  • Comments: 47 (28 by maintainers)

Most upvoted comments

I’m still hitting this on k8s v1.9.6 and Docker version 17.03.2-ce, build f5ec1e2 based on kops AMI.

@dchen1107 This is kinda critical issue since it doesn’t resolve itself and containers get stuck in Terminating and ContainerCreating. Can we please bump the priority?

/cc @kubernetes/sig-node

Can this please get reopened?

anyone tried with finalizers?

kubectl patch pod <pod> -p '{"metadata":{"finalizers":null}}'

I’m still hitting this on k8s v1.8.6 and Docker version 1.13.1, build 092cba3

Feb 23 05:29:31 ip-10-1-57-70 kubelet[30214]: , failed to "KillPodSandbox" for "2423f1b6-0293-11e8-91ef-1259591cb356" with KillPodSandboxError: "rpc error: code = Unknown desc = Error response from daemon: Cannot stop container 72af6e5d6cceb319873fca55bc5964642a0f2cfed04cca01dc1480e660328f10: Cannot kill container 72af6e5d6cceb319873fca55bc5964642a0f2cfed04cca01dc1480e660328f10: rpc error: code = 14 desc = grpc: the connection is unavailable"
Feb 23 05:29:31 ip-10-1-57-70 kubelet[30214]: ]
Feb 23 05:29:31 ip-10-1-57-70 kubelet[30214]: E0223 05:29:31.973871   30214 docker_sandbox.go:240] Failed to stop sandbox "71905e02c6e37f5e07c0d4b46ade531e50bf2577fdb5ed254b98f5f83886da6e": Error response from daemon: Cannot stop container 71905e02c6e37f5e07c0d4b46ade531e50bf2577fdb5ed254b98f5f83886da6e: Cannot kill container 71905e02c6e37f5e07c0d4b46ade531e50bf2577fdb5ed254b98f5f83886da6e: rpc error: code = 14 desc = grpc: the connection is unavailable
Feb 23 05:29:31 ip-10-1-57-70 kubelet[30214]: E0223 05:29:31.974087   30214 remote_runtime.go:115] StopPodSandbox "71905e02c6e37f5e07c0d4b46ade531e50bf2577fdb5ed254b98f5f83886da6e" from runtime service failed: rpc error: code = Unknown desc = Error response from daemon: Cannot stop container 71905e02c6e37f5e07c0d4b46ade531e50bf2577fdb5ed254b98f5f83886da6e: Cannot kill container 71905e02c6e37f5e07c0d4b46ade531e50bf2577fdb5ed254b98f5f83886da6e: rpc error: code = 14 desc = grpc: the connection is unavailable

Docker runtime is up. I’m able to run docker images and docker ps commands fine.

Restarting the docker service fixes it.

/remove-lifecycle rotten /lifecycle frozen

Just FYI, if above solutions doesn’t work, instead of restarting Docker service, we may just kill the corresponding container process of the stuck pod, i.e.,

  1. go to the corresponding node
  2. use docker ps find the container id for the correpsponding pod
  3. run ps aux|grep $container_id
  4. kill the process kill -9 $process_id
  5. remove the pod from k8s kubectl delete pod $pod_name --grace-period=0 --force --namespace $namespace

Yes, Even in kubernetes version 1.10 docker version 17.03.2-ce this issue still consistent. I used sysbench tool to create 1Gig files on the container, then it went stale. Then i tried to delete pod normally didn’t happen then i tried “–grace-period 0” still this was in “Terminating” state. I strongly suspect this might be IO issue.

Pod:

# kubectl get pods -n myname
NAME                                          READY     STATUS        RESTARTS   AGE
benchmark-app-1535722045 		      1/1       Terminating   0          3d

Log:

pod_workers.go:186] Error syncing pod 995fb967-ad21-11e8-8837-a81e847d8f7c ("benchmark-app-1535722045_myname(995fb967-ad21-11e8-8837-a81e847d8f7c)"), skipping: error killing pod: failed to "KillContainer" for "benchmark-app-1535722045" with KillContainerError: "rpc error: code = Unknown desc = operation timeout: context deadline exceeded"

Here is very interesting statistics of docker container #docker stats 87bb32150c0e

CONTAINER           CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
87bb32150c0e        --                  -- / --             --                  --                  --                  --

We are experiencing this bug on Kubernetes 1.17.8

Can this issue please be reopened?

  Warning  FailedCreatePodSandBox  86s (x42 over 92m)  kubelet, ip-12345.some-aws-region.compute.internal  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "some-job-in-production-1601071200-bg4s9": operation timeout: context deadline exceeded

-p ‘{“metadata”:{“finalizers”:null}}’

You are a life saver.

Hi, I’ve experienced the same issue when attempting to delete mysql containers with k8s v1.9.3.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:13:31Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

I’ve started the cluster using Kops and I’m running the debian stretch AMI debian-stretch-hvm-x86_64-gp2-2018-06-13-59294 (ami-810b35e4). Disclaimer: this is not the ami supported by kops (they’re still using jessie), not sure if this could be related to the issue.

I’ve tried kubectl delete --now --force and they’re still there. Even removing the container with docker rm -f didn’t solve it.

mysql                        0/2       Terminating   4          1d
mysql2                       0/2       Terminating   7          23h

The issue seems pretty consistent, as I got it on both mysql containers I tried to run.

Didn’t find anything helpful on kubelet logs either (tried to grep for mysql and filter for err messages and got nothing).

I’m facing the same error on k8s 1.17.7, Docker version 19.03.4, build 9013bf583a.

Warning FailedKillPod 10s kubelet, ip-1-2-3-4.region.compute.internal error killing pod: failed to “KillContainer” for “pod-name” with KillContainerError: “rpc error: code = Unknown desc = operation timeout: context deadline exceeded”