kubernetes: kubectl drain error message confusing when using with operator
/kind bug
What happened: Scenario:
- You run an etcd cluster with etcd-operator
- You drain a node with etcd-operator on it.
You get a message like:
$ kubectl drain $NODE
pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): example-etcd-cluster-0002
This error message is not useful, given that Operators are popular and recommended.
What you expected to happen: No error message.
How to reproduce it (as minimally and precisely as possible):
# PREREQ: have a Cluster.
# Install coreos operator as follows:
mkdir coreos
cd coreos
git clone https://github.com/coreos/etcd-operator
cd etcd-operator/
example/rbac/create_role.sh
kubectl create -f example/deployment.yaml
kubectl get customresourcedefinitions
# Make an etcd cluster
kubectl create -f example/example-etcd-cluster.yaml
kubectl get pods -L etcd_cluster -l etcd_cluster=example-etcd-cluster
kubectl get etcdcluster example-etcd-cluster -o json | jq .status.members
# Wait until some guys are ready
watch "kubectl get etcdcluster example-etcd-cluster -o json | jq .status.members"
# Find a node that has an etcd pod on it. Call it $NODE.
# in example, NODE="gke-cluster-2-default-pool-4904abf8-mt8t"
$ kubectl drain $NODE
node "gke-cluster-2-default-pool-4904abf8-mt8t" already cordoned
error: pods with local storage (use --delete-local-data to override): example-etcd-cluster-0004; pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): example-etcd-cluster-0004, fun2, kube-proxy-gke-cluster-2-default-pool-4904abf8-mt8t; DaemonSet-managed pods (use --ignore-daemonsets to ignore): fluentd-gcp-v2.0.9-bc4jm
The offending message is pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override).
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
):Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-30T09:51:01Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.8-gke.0", GitCommit:"a7061d4b09b53ab4099e3b5ca3e80fb172e1b018", GitTreeState:"clean", BuildDate:"2017-10-10T18:48:45Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: GKE
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 20
- Comments: 28 (10 by maintainers)
Commits related to this issue
- Add a sleep if kubectl drain errors kubectl drain can fail if the node is running an operator managed resource, see: https://github.com/kubernetes/kubernetes/issues/57049 If this this happens we sho... — committed to utilitywarehouse/kube-aws-updater by george-angel 6 years ago
- Add a sleep if kubectl drain errors kubectl drain can fail if the node is running an operator managed resource, see: https://github.com/kubernetes/kubernetes/issues/57049 If this this happens we sho... — committed to utilitywarehouse/kube-aws-updater by george-angel 6 years ago
/remove-lifecycle stale
the manual workaround is fine, but would be better if
drain
at least provided an option to force delete the operator controlled pods+1
For anyone who encounters this same problem, a potential workaround is to just delete the node that it’s having a problem draining, then running the rolling-update again. IE:
failed to drain node "node-name": error draining node: Unknown controller kind "bad-controller"
$ kubectl delete node "node-name"
This worked for my situation, at least.
This appears to be fixed in kubectl 1.10 and beyond.