kubernetes: kubectl drain error message confusing when using with operator

/kind bug

What happened: Scenario:

  • You run an etcd cluster with etcd-operator
  • You drain a node with etcd-operator on it.

You get a message like:

$ kubectl drain $NODE
pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): example-etcd-cluster-0002

This error message is not useful, given that Operators are popular and recommended.

What you expected to happen: No error message.

How to reproduce it (as minimally and precisely as possible):

# PREREQ: have a Cluster.  

# Install coreos operator as follows:
mkdir coreos
cd coreos
git clone https://github.com/coreos/etcd-operator
cd etcd-operator/
example/rbac/create_role.sh
kubectl create -f example/deployment.yaml
kubectl get customresourcedefinitions

# Make an etcd cluster
kubectl create -f example/example-etcd-cluster.yaml
kubectl get pods -L etcd_cluster -l etcd_cluster=example-etcd-cluster
kubectl get etcdcluster example-etcd-cluster -o json | jq .status.members
# Wait until some guys are ready
watch "kubectl get etcdcluster example-etcd-cluster -o json | jq .status.members"

# Find a node that has an etcd pod on it. Call it $NODE.
# in example, NODE="gke-cluster-2-default-pool-4904abf8-mt8t"
$ kubectl drain $NODE
node "gke-cluster-2-default-pool-4904abf8-mt8t" already cordoned
error: pods with local storage (use --delete-local-data to override): example-etcd-cluster-0004; pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): example-etcd-cluster-0004, fun2, kube-proxy-gke-cluster-2-default-pool-4904abf8-mt8t; DaemonSet-managed pods (use --ignore-daemonsets to ignore): fluentd-gcp-v2.0.9-bc4jm

The offending message is pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override).

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-30T09:51:01Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.8-gke.0", GitCommit:"a7061d4b09b53ab4099e3b5ca3e80fb172e1b018", GitTreeState:"clean", BuildDate:"2017-10-10T18:48:45Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: GKE
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 20
  • Comments: 28 (10 by maintainers)

Commits related to this issue

Most upvoted comments

/remove-lifecycle stale

the manual workaround is fine, but would be better if drain at least provided an option to force delete the operator controlled pods

+1

For anyone who encounters this same problem, a potential workaround is to just delete the node that it’s having a problem draining, then running the rolling-update again. IE:

failed to drain node "node-name": error draining node: Unknown controller kind "bad-controller" $ kubectl delete node "node-name"

This worked for my situation, at least.

This appears to be fixed in kubectl 1.10 and beyond.