kops: Problem Deploying Autoscaler with v1.5.1
Using Kops v1.5.0-beta2, if I deploy the Cluster Autoscaler as described here on AWS, it appears to fail. Here’s exactly what I ran:
CLOUD_PROVIDER=aws
IMAGE=gcr.io/google_containers/cluster-autoscaler:v0.4.0
MIN_NODES=3
MAX_NODES=24
AWS_REGION=us-east-1
GROUP_NAME="k8s-worker"
SSL_CERT_PATH="/etc/ssl/certs/ca-certificates.crt" # (/etc/ssl/certs for gce)
addon=cluster-autoscaler.yml
wget -O ${addon} https://raw.githubusercontent.com/kubernetes/kops/master/addons/cluster-autoscaler/v1.4.0.yaml
sed -i -e "s@{{CLOUD_PROVIDER}}@${CLOUD_PROVIDER}@g" "${addon}"
sed -i -e "s@{{IMAGE}}@${IMAGE}@g" "${addon}"
sed -i -e "s@{{MIN_NODES}}@${MIN_NODES}@g" "${addon}"
sed -i -e "s@{{MAX_NODES}}@${MAX_NODES}@g" "${addon}"
sed -i -e "s@{{GROUP_NAME}}@${GROUP_NAME}@g" "${addon}"
sed -i -e "s@{{AWS_REGION}}@${AWS_REGION}@g" "${addon}"
sed -i -e "s@{{SSL_CERT_PATH}}@${SSL_CERT_PATH}@g" "${addon}"
kubectl apply -f ${addon}
Here is the log from the pod itself:
2017-02-06T21:53:09.516651243Z I0206 21:53:09.516516 1 cluster_autoscaler.go:353] Cluster Autoscaler 0.4.0
2017-02-06T21:53:09.833039609Z E0206 21:53:09.832856 1 event.go:257] Could not construct reference to: '&api.Endpoints{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"cluster-autoscaler", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil)}, Subsets:[]api.EndpointSubset(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' '%v became leader' 'cluster-autoscaler-362589257-qvjps'
2017-02-06T21:53:09.833083521Z I0206 21:53:09.832940 1 leaderelection.go:215] sucessfully acquired lease kube-system/cluster-autoscaler
2017-02-06T21:55:10.236022480Z E0206 21:55:10.235891 1 aws_manager.go:81] Error while regenerating Asg cache: RequestError: send request failed
2017-02-06T21:55:10.236074713Z caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp: i/o timeout
2017-02-06T21:57:10.654286671Z W0206 21:57:10.654164 1 cluster_autoscaler.go:202] Cluster is not ready for autoscaling: Error while looking for ASG for instance {Name:i-04dab8e1abd2eeadf}, error: RequestError: send request failed
2017-02-06T21:57:10.654433665Z caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp: i/o timeout
2017-02-06T21:59:20.945923291Z W0206 21:59:20.945820 1 cluster_autoscaler.go:202] Cluster is not ready for autoscaling: Error while looking for ASG for instance {Name:i-07cb35848b7303a8c}, error: RequestError: send request failed
2017-02-06T21:59:20.945959787Z caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp: i/o timeout
2017-02-06T22:01:31.312714356Z W0206 22:01:31.312578 1 cluster_autoscaler.go:202] Cluster is not ready for autoscaling: Error while looking for ASG for instance {Name:i-045dfa57161b3f669}, error: RequestError: send request failed
2017-02-06T22:01:31.312762096Z caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp: i/o timeout
2017-02-06T22:03:41.648172881Z W0206 22:03:41.648044 1 cluster_autoscaler.go:202] Cluster is not ready for autoscaling: Error while looking for ASG for instance {Name:i-0922d3fd192fa3708}, error: RequestError: send request failed
2017-02-06T22:03:41.648207274Z caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp: i/o timeout
2017-02-06T22:05:51.955455693Z W0206 22:05:51.955355 1 cluster_autoscaler.go:202] Cluster is not ready for autoscaling: Error while looking for ASG for instance {Name:i-0922d3fd192fa3708}, error: RequestError: send request failed
2017-02-06T22:05:51.955490454Z caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp: i/o timeout
2017-02-06T22:08:02.268974568Z W0206 22:08:02.268861 1 cluster_autoscaler.go:202] Cluster is not ready for autoscaling: Error while looking for ASG for instance {Name:i-0922d3fd192fa3708}, error: RequestError: send request failed
2017-02-06T22:08:02.269019618Z caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp: lookup autoscaling.us-east-1.amazonaws.com on 100.64.0.10:53: dial udp 100.64.0.10:53: i/o timeout
2017-02-06T22:10:12.709737594Z W0206 22:10:12.709623 1 cluster_autoscaler.go:202] Cluster is not ready for autoscaling: Error while looking for ASG for instance {Name:i-07cb35848b7303a8c}, error: RequestError: send request failed
2017-02-06T22:10:12.709795552Z caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp: i/o timeout
2017-02-06T22:12:22.947293286Z W0206 22:12:22.947191 1 cluster_autoscaler.go:202] Cluster is not ready for autoscaling: Error while looking for ASG for instance {Name:i-04dab8e1abd2eeadf}, error: RequestError: send request failed
2017-02-06T22:12:22.947324194Z caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp: i/o timeout
2017-02-06T22:14:33.216621196Z W0206 22:14:33.216505 1 cluster_autoscaler.go:202] Cluster is not ready for autoscaling: Error while looking for ASG for instance {Name:i-07cb35848b7303a8c}, error: RequestError: send request failed
2017-02-06T22:14:33.216661881Z caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp: i/o timeout
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 27 (10 by maintainers)
Figured this out, it was the fact that a kube-dns pod was not running on the master node. To run it, had to add the master toleration to the kube-dns deployment (same as with cluster-autoscaler deployment above). Once kube-dns was running on the master, autoscaler was able to use it to get ASG info from AWS and scale up from 0 nodes.
The problem might also come and go. Or not be triggered until there’s a scaling event.
In a cluster that had previously not reported any errors, I intentional deployed an exorbitant number of replicas to trigger a scaling event, but it failed while trying to scale up.
@pluttrell At a quick review it seems a problem with your dns
@yissachar Thanks for the suggestion. I deleted the old deployment and recreated it, but this time using the name of my Nodes ASG, as follows:
But still see the same problem:
Mine turned out to be specifying the AZ (us-east-1a), and not the region (us-east-1). The URL showed my error, but I overlooked it.
@KyleU example https://github.com/kubernetes/kops/blob/master/upup/models/cloudup/resources/addons/external-dns.addons.k8s.io/pre-k8s-1.6.yaml.template#L27
I get this error in my autoscaler log
what does error mean ? It’s trying to connect to some unknown host
100.64.0.10:53Finally figured it out. Following the template at https://github.com/kubernetes/contrib/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#1-asg-setup-min-1-max-10-asg-name-k8s-worker-asg-1 works, but the template at https://github.com/kubernetes/kops/tree/master/addons/cluster-autoscaler doesn’t.
Correcting for indentation, the diff is (working version is ‘<’)
Which of those is the crucial difference I don’t know.