kubernetes: [1.9] kube-controller-manager fails to start with cloud-provider=aws
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug /sig cloud-provider
What happened: –cloud-provider=aws in /etc/kubernetes/manifests/kube-controller-manager.yaml makes it fail to start with CrashLoopBackOff.
kube-controller-manager.yaml
kind: Pod
metadata:
name: kube-controller-manager
namespace: kube-system
spec:
containers:
- command:
- kube-controller-manager
- --leader-elect=true
- --use-service-account-credentials=true
- --controllers=*,bootstrapsigner,tokencleaner
- --root-ca-file=/etc/kubernetes/pki/ca.crt
- --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
- --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
- --address=127.0.0.1
- --kubeconfig=/etc/kubernetes/controller-manager.conf
- --service-account-private-key-file=/etc/kubernetes/pki/sa.key
- --allocate-node-cidrs=true
- --cluster-cidr=10.244.0.0/16
- --node-cidr-mask-size=24
- --cloud-provider=aws <----- (without this line, no issue)
Pod state
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-controller-manager-ip-172-31-4-117.us-west-1.compute.internal 0/1 CrashLoopBackOff 2 34s
kubelet journalctl output
Dec 31 07:13:58 ip-172-31-4-117.us-west-1.compute.internal kubelet[2055]: I1231 07:13:58.985855 2055 kuberuntime_manager.go:514] Container {Name:kube-controller-manager Image:gcr.io/google_containers/kube-controller-manager-amd64:v1.9.0 Command:[kube-controller-manager --leader-elect=true --use-service-account-credentials=true --controllers=*,bootstrapsigner,tokencleaner --root-ca-file=/etc/kubernetes/pki/ca.crt --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --address=127.0.0.1 --kubeconfig=/etc/kubernetes/controller-manager.conf --service-account-private-key-file=/etc/kubernetes/pki/sa.key --allocate-node-cidrs=true --cluster-cidr=10.244.0.0/16 --node-cidr-mask-size=24 --cloud-provider=aws] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI}]} VolumeMounts:[{Name:ca-certs ReadOnly:true MountPath:/etc/ssl/certs SubPath: MountPropagation:<nil>} {Name:kubeconfig ReadOnly:true MountPath:/etc/kubernetes/controller-manager.conf SubPath: MountPropagation:<nil>} {Name:flexvolume-dir ReadOnly:false MountPath:/usr/libexec/kubernetes/kubelet-plugins/volume/exec SubPath: MountPropagation:<nil>} {Name:ca-certs-etc-pki ReadOnly:true MountPath:/etc/pki SubPath: MountPropagation:<nil>} {Name:k8s-certs ReadOnly:true MountPath:/etc/kubernetes/pki SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:10252,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Dec 31 07:13:58 ip-172-31-4-117.us-west-1.compute.internal kubelet[2055]: I1231 07:13:58.985954 2055 kuberuntime_manager.go:758] checking backoff for container "kube-controller-manager" in pod "kube-controller-manager-ip-172-31-4-117.us-west-1.compute.internal_kube-system(9b9fc44cf056d9e6af795cbe507c7e4f)"
Dec 31 07:13:58 ip-172-31-4-117.us-west-1.compute.internal kubelet[2055]: I1231 07:13:58.986076 2055 kuberuntime_manager.go:768] Back-off 10s restarting failed container=kube-controller-manager pod=kube-controller-manager-ip-172-31-4-117.us-west-1.compute.internal_kube-system(9b9fc44cf056d9e6af795cbe507c7e4f)
Dec 31 07:13:58 ip-172-31-4-117.us-west-1.compute.internal kubelet[2055]: E1231 07:13:58.986105 2055 pod_workers.go:186] Error syncing pod 9b9fc44cf056d9e6af795cbe507c7e4f ("kube-controller-manager-ip-172-31-4-117.us-west-1.compute.internal_kube-system(9b9fc44cf056d9e6af795cbe507c7e4f)"), skipping: failed to "StartContainer" for "kube-controller-manager" with CrashLoopBackOff: "Back-off 10s restarting failed container=kube-controller-manager pod=kube-controller-manager-ip-172-31-4-117.us-west-1.compute.internal_kube-system(9b9fc44cf056d9e6af795cbe507c7e4f)"
What you expected to happen:
As per https://kubernetes.io/docs/reference/generated/kube-controller-manager documentation, --cloud-provider=aws can be specified and the controller starts.
The reason to specify it was because the controller was complaining as below.
$journalctl -b -f CONTAINER_ID=$(docker ps | grep k8s_kube-controller-manager | awk '{ print $1 }')
Dec 31 04:54:47 ip-172-31-4-117.us-west-1.compute.internal dockerd-current[1136]: E1231 04:54:47.447916 1 reconciler.go:292] attacherDetacher.AttachVolume failed to start for volume "pv-ebs" (UniqueName: "kubernetes.io/aws-ebs/vol-0d275986ce24f4304") from node "ip-172-31-4-61.us-west-1.compute.internal" : AttachVolume.NewAttacher failed for volume "pv-ebs" (UniqueName: "kubernetes.io/aws-ebs/vol-0d275986ce24f4304") from node "ip-172-31-4-61.us-west-1.compute.internal" : Failed to get AWS Cloud Provider. GetCloudProvider returned <nil> instead
How to reproduce it (as minimally and precisely as possible): Add the line as a command parameter in /etc/kubernetes/manifests/kube-controller-manager.yaml.
- --cloud-provider=aws
Anything else we need to know?:
Not being able to specify --cloud-provider could potentially causing the issue of K8S Unable to mount AWS EBS as a persistent volume for pod .
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf has “-cloud-provider=aws”.
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice"
Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true --cert-dir=/var/lib/kubelet/pki"
Environment="KUBELET_EXTRA_ARGS=--cloud-provider=aws"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS
Environment:
-
Kubernetes version (use
kubectl version
):$ kubectl version -o json { “clientVersion”: { “major”: “1”, “minor”: “9”, “gitVersion”: “v1.9.0”, “gitCommit”: “925c127ec6b946659ad0fd596fa959be43f0cc05”, “gitTreeState”: “clean”, “buildDate”: “2017-12-15T21:07:38Z”, “goVersion”: “go1.9.2”, “compiler”: “gc”, “platform”: “linux/amd64” }, “serverVersion”: { “major”: “1”, “minor”: “9”, “gitVersion”: “v1.9.0”, “gitCommit”: “925c127ec6b946659ad0fd596fa959be43f0cc05”, “gitTreeState”: “clean”, “buildDate”: “2017-12-15T20:55:30Z”, “goVersion”: “go1.9.2”, “compiler”: “gc”, “platform”: “linux/amd64” } }
-
Cloud provider or hardware configuration: aws
-
OS (e.g. from /etc/os-release):
$ cat /etc/centos-release CentOS Linux release 7.4.1708 (Core)
-
Kernel (e.g.
uname -a
):
Linux ip-172-31-4-117.us-west-1.compute.internal 3.10.0-693.11.1.el7.x86_64 #1 SMP Mon Dec 4 23:52:40 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
kubeadm and ansible
- Others:
Unless using the AWS cloud provider feature, pods are working.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 27 (13 by maintainers)
Commits related to this issue
- https://github.com/kubernetes/kubernetes/issues/57718 To be able to use AWS cloud provider. — committed to oonisim/kubernetes-installation by deleted user 6 years ago
Solution
Thanks @dims for guiding me the right way.
After looking into the tickets and read the comments by @kingdonb
Found the documentation below. Is this in kubernetes.io and if not … Why!?
K8S AWS Cloud Provider Notes
Steps
Tagged EC2/SG
Run kubeadm init --config <kubeadm config> to pass cloud provider.
Result
Hi @dims, I have read your comments. Still the point is why --cloud-provider option does not work as it is clearly documented in the kube-controller-manager reference document.
If this option does not work, then need to clarify if it is as expected or not. If it is as expected, then the documentation fix would be raised, if it is not, then need to clarify why.
If this issue still need to be closed, I suppose all related issues like below would need to be closed too and make it clear that giving --cloud-provider option is not to use but need to give it to kubeadm only if kubeadm is used.
Please give a clarification and the mechanism behind why giving --cloud-provider option to kube-controller-manager does not work although is clearly documented.
Possible suggestion would be re-run kubeadm init after kubeadm reset. However it is also problematic and needs manual clean up of CNI files, etc which will prevent kubeadm init to run.
Initially giving the --cloud-provider=aws in systemd unit file for kubelet stopped the error in kubelet journelctl output, but it did not stop the errors in kube-controller-manager. If we can simply provide --cloud-provider=aws to kube-controller-manager, it would be a good way for all of us who get into this --cloud-provider related issues. Also api-server can take the option with no issue. So why kube-controller-manager fails and what is specific with kube-controller-manager.
For all reading this issue. the following comment helped me alot!
https://github.com/kubernetes/kubernetes/issues/53538#issuecomment-345942305
Hi @dims. I am running into a similar issue. Just a couple of questions/confusions on your and @oonisim 's provided solutions:
Is it really necessary to tag the resources? (Just asking to make things simpler)If so, then where can I find the
<cluster name>
?Does this config file need to be created? Or is it located somwhere? If so, where?
Can you please specify exactly what should be the content of the
cloud-config
for AWS?cloud-provider
of the existing cluster without doingkubeadm init
again and specifying thekubeadm.conf
. Because what I understand right now is that we need to reset the cluster and thenkubeadm init
again with thekubeadm.conf
to change the cloud-provider. Please correct me if I am wrong.Thanks
Hi @dims, why this is closed? the original issue of not being able to specify --cloud-provider=aws in /etc/kubernetes/manifest/kube-controller-manager.yaml is still there. Please clarify if it is the expected behaviour, or otherwise please reopen to clarify why it does not work.
The installation guide Using kubeadm to Create a Cluster does not mention Cloud Provider and it needs to specify --cloud-provider or configuration file including cloud provider with --config. It would not be uncommon later people notice cloud provider configuration is required and try to reconfigure API server, kubelet, Controller.
In kubeadm join unknown flag: --cloud-provider it says:
If to add --cloud-provider=aws in the config file is the way for kubelet, then it should be the same for API server and controller.