autoscaler: CA failed to load Instance Type list unless configured with hostNetworking
Which component are you using?: cluster-autoscaler
What version of the component are you using?: Helm chart 9.10.8 cluster-autoscaler v1.21.1
Component version:
What k8s version are you using (kubectl version)?:
v1.21
kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:38:26Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:20:15Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
What environment is this in?: AWS EKS
What did you expect to happen?: It will load instance type list normally and keep running.
What happened instead?: It keep CrashLoopBack and exit with error 255
How to reproduce it (as minimally and precisely as possible):
Set environment variable with
AWS_REGION: ap-northeast-3
Anything else we need to know?:
Part of logs:
1112 07:23:25.974866 1 main.go:391] Cluster Autoscaler 1.21.1
I1112 07:23:25.996783 1 leaderelection.go:243] attempting to acquire leader lease kube-system/cluster-autoscaler...
I1112 07:23:26.016572 1 leaderelection.go:253] successfully acquired lease kube-system/cluster-autoscaler
I1112 07:23:26.016842 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Lease", Namespace:"kube-system", Name:"cluster-autoscaler", UID:"04f7e024-313b-4cd3-9e47-1bd8ab89d128", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"14162", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' hub-c-a-aws-cluster-autoscaler-fdb7d96d4-b9rg9 became leader
I1112 07:23:26.019206 1 reflector.go:219] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:188
I1112 07:23:26.019328 1 reflector.go:255] Listing and watching *v1.Pod from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:188
I1112 07:23:26.020108 1 reflector.go:219] Starting reflector *v1.DaemonSet (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:320
I1112 07:23:26.020220 1 reflector.go:255] Listing and watching *v1.DaemonSet from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:320
I1112 07:23:26.020557 1 reflector.go:219] Starting reflector *v1.ReplicationController (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:329
I1112 07:23:26.020573 1 reflector.go:255] Listing and watching *v1.ReplicationController from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:329
I1112 07:23:26.020868 1 reflector.go:219] Starting reflector *v1.Job (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:338
I1112 07:23:26.020883 1 reflector.go:255] Listing and watching *v1.Job from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:338
I1112 07:23:26.021148 1 reflector.go:219] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:212
I1112 07:23:26.021242 1 reflector.go:255] Listing and watching *v1.Pod from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:212
I1112 07:23:26.021155 1 reflector.go:219] Starting reflector *v1.ReplicaSet (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:347
I1112 07:23:26.021494 1 reflector.go:255] Listing and watching *v1.ReplicaSet from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:347
I1112 07:23:26.021216 1 reflector.go:219] Starting reflector *v1.StatefulSet (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356
I1112 07:23:26.021667 1 reflector.go:255] Listing and watching *v1.StatefulSet from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356
I1112 07:23:26.021267 1 reflector.go:219] Starting reflector *v1.Node (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I1112 07:23:26.021770 1 reflector.go:255] Listing and watching *v1.Node from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I1112 07:23:26.021279 1 reflector.go:219] Starting reflector *v1beta1.PodDisruptionBudget (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309
I1112 07:23:26.021938 1 reflector.go:255] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:309
I1112 07:23:26.021232 1 reflector.go:219] Starting reflector *v1.Node (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
I1112 07:23:26.022155 1 reflector.go:255] Listing and watching *v1.Node from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:246
W1112 07:23:26.040478 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W1112 07:23:26.061120 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
I1112 07:23:26.067058 1 cloud_provider_builder.go:29] Building aws cloud provider.
F1112 07:23:26.067164 1 aws_cloud_provider.go:365] Failed to generate AWS EC2 Instance Types: unable to load EC2 Instance Type list
goroutine 61 [running]:
k8s.io/klog/v2.stacks(0xc0000c2001, 0xc0009fe000, 0x8a, 0xee)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:1021 +0xb8
k8s.io/klog/v2.(*loggingT).output(0x629d5a0, 0xc000000003, 0x0, 0x0, 0xc00004c230, 0x61ad5f1, 0x15, 0x16d, 0x0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:970 +0x1a3
k8s.io/klog/v2.(*loggingT).printf(0x629d5a0, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x3e68953, 0x2d, 0xc001044900, 0x1, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:751 +0x18b
k8s.io/klog/v2.Fatalf(...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:1509
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/aws.BuildAWS(0x3fe0000000000000, 0x3fe0000000000000, 0x8bb2c97000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/aws/aws_cloud_provider.go:365 +0x290
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder.buildCloudProvider(0x3fe0000000000000, 0x3fe0000000000000, 0x8bb2c97000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder/builder_all.go:69 +0x18f
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder.NewCloudProvider(0x3fe0000000000000, 0x3fe0000000000000, 0x8bb2c97000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder/cloud_provider_builder.go:45 +0x1e6
k8s.io/autoscaler/cluster-autoscaler/core.initializeDefaultOptions(0xc0010076e0, 0x4530301, 0x8)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/autoscaler.go:101 +0x2fd
k8s.io/autoscaler/cluster-autoscaler/core.NewAutoscaler(0x3fe0000000000000, 0x3fe0000000000000, 0x8bb2c97000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/autoscaler.go:65 +0x43
main.buildAutoscaler(0x972073, 0xc000634f50, 0x457dc20, 0xc00039d500)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:337 +0x368
main.run(0xc00007efa0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:343 +0x39
main.main.func2(0x453c8a0, 0xc0000c9b00)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:447 +0x2a
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:207 +0x113
goroutine 1 [select]:
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000e77c00, 0x44cea80, 0xc000311620, 0xc0000c9b01, 0xc000056c00)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:167 +0x13f
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0008bfc00, 0x77359400, 0x0, 0xc0000c9b01, 0xc000056c00)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
goroutine 1 [select]:
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000e77c00, 0x44cea80, 0xc000311620, 0xc0000c9b01, 0xc000056c00)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:167 +0x13f
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0008bfc00, 0x77359400, 0x0, 0xc0000c9b01, 0xc000056c00)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
k8s.io/client-go/tools/leaderelection.(*LeaderElector).renew(0xc0001bf320, 0x453c8a0, 0xc0000c9b40)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:263 +0x107
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc0001bf320, 0x453c8a0, 0xc0000c9b00)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:208 +0x13b
k8s.io/client-go/tools/leaderelection.RunOrDie(0x453c8e0, 0xc0000ae008, 0x4571bc0, 0xc00092eb40, 0x37e11d600, 0x2540be400, 0x77359400, 0xc00069d8e0, 0x3f40d28, 0x0, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:222 +0x96
main.main()
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:438 +0x829
goroutine 18 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x629d5a0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:1164 +0x8b
created by k8s.io/klog/v2.init.0
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:418 +0xdd
goroutine 48 [runnable]:
sync.runtime_SemacquireMutex(0xc0000a1a44, 0xc000966c00, 0x1)
/usr/local/go/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc0000a1a40)
/usr/local/go/src/sync/mutex.go:138 +0xfc
sync.(*Mutex).Lock(...)
/usr/local/go/src/sync/mutex.go:81
sync.(*Map).Load(0xc0000a1a40, 0x339f5a0, 0xc000966d38, 0xc000c442f8, 0x5a9fc18f48a93701, 0x5a0000000040c8f4)
/usr/local/go/src/sync/map.go:106 +0x2c4
github.com/modern-go/reflect2.(*frozenConfig).Type2(0xc00009d180, 0x45acfa0, 0xc000e3a540, 0x3711f40, 0xc000966f00)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 20 (8 by maintainers)
What’s the solution facing the same issue with EKS 1.24? Cluster is public while CA trying to access sts which the public getting timeout
In my case, the cluster-autoscaler pod fails accessing the public AWS sts service endpoint via its public IP:
My EKS is a private cluster, with a private VPC sts interface endpoint configured, like this:
We have the same issue in Ireland eu-west-1 region. Which component are you using?: cluster-autoscaler
What version of the component are you using?: cluster-autoscaler v1.21.1
Component version: What k8s version are you using (kubectl version)?: v1.21
kubectl version Client Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-13+d2965f0db10712", GitCommit:"d2965f0db1071203c6f5bc662c2827c71fc8b20d", GitTreeState:"clean", BuildDate:"2021-06-26T01:02:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-0389ca3", GitCommit:"8a4e27b9d88142bbdd21b997b532eb6d493df6d2", GitTreeState:"clean", BuildDate:"2021-07-31T01:34:46Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}What happened instead?: It keep CrashLoopBack
kube-system pod/cluster-autoscaler-79475c6789-tnljd 0/1 CrashLoopBackOff 9Logs
W1129 1 aws_util.go:84] Error fetching https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/eu-west-1/index.json skipping... Get "https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/eu-west-1/index.json": dial tcp: i/o timeout F1129 aws_cloud_provider.go:365] Failed to generate AWS EC2 Instance Types: unable to load EC2 Instance Type list goroutine 32Troubleshooting If I added
--aws-use-static-instance-list=trueto CA it is running some time,kube-system pod/cluster-autoscaler-cc975695c-rwlzv 1/1 Running 2 5m3sbut crashed again after with log:E1129 17:59:44.241301 1 aws_manager.go:265] Failed to regenerate ASG cache: cannot autodiscover ASGs: RequestError: send request failed caused by: Post "https://autoscaling.eu-west-1.amazonaws.com/": dial tcp: i/o timeout F1129 17:59:44.241348 1 aws_cloud_provider.go:389] Failed to create AWS Manager: cannot autodiscover ASGs: RequestError: send request failed caused by: Post "https://autoscaling.eu-west-1.amazonaws.com/": dial tcp: i/o timeout goroutine 71 [running]:Yeah I think that is a reasonable change although I’m not sure it solves the specific issue as in my case, falling back to that static list still resulted in fatal crashing as it attempted to access resources outside the cluster elsehwere.
What I might propose is an obvious check (as it seemingly is a requirement of the cluster autoscaler here, not sure if it is AWS specific or not though) that the pod the cluster autoscaler is running in has access to external resources outside the cluster (e.g. can access the internet) and if it can’t, error with an explicit message that is seemingly less cryptic than the ones noted above.
E.g.
… Hope that makes sense 😃
To add some more context, when I was attempting to debug the issue I had specifically, seeing messages of ‘timeout’ I was unsure if the context deadline was being hit as a result of latency. If the endpoint data was so big that again it was timing out. If the timeout was permission related and kept trying until again the context deadline exceeded. (It’s not a normal perception that your thing in the cloud can’t reach the cloud 😃 )