kops: nodeup error downloading assets from storage.googleapis.com when using official Ubuntu 20.04 AMI
1. What kops version are you running?
Version 1.18.2 (git-84495481e4)
2. What Kubernetes version are you running?
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.13", GitCommit:"30d651da517185653e34e7ab99a792be6a3d9495", GitTreeState:"clean", BuildDate:"2020-10-15T01:06:31Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.13", GitCommit:"30d651da517185653e34e7ab99a792be6a3d9495", GitTreeState:"clean", BuildDate:"2020-10-15T00:59:17Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
3. What cloud provider are you using? AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops rolling-update cluster --name my-cluster-name --yes
5. What happened after the commands executed? The new node hasn’t joined cluster over 20 minutes.
kops logs
...
I1110 16:21:43.388467 23599 instancegroups.go:143] waiting for 15s after detaching instance
I1110 16:21:58.391501 23599 instancegroups.go:383] Validating the cluster.
I1110 16:22:01.241446 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": InstanceGroup "nodes-a1" did not have enough nodes 0 vs 1.
I1110 16:22:34.057496 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:23:07.378220 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:23:40.039431 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:24:12.913123 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:24:45.776340 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:25:18.334645 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:25:50.946073 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:26:24.604930 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:26:57.899771 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:27:30.567355 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:28:03.045796 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:28:35.805567 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:29:08.260794 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:29:41.070586 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:30:13.450487 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:30:46.739787 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:31:20.317658 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:31:52.790707 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:32:25.907071 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:32:58.928122 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:33:32.108136 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:34:04.983422 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:34:37.503496 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:35:10.819474 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:35:43.688290 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:36:16.256246 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:36:48.916137 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:37:21.481365 23599 instancegroups.go:437] Cluster did not pass validation within deadline: machine "i-0de5271375ff81ce1" has not yet joined cluster.
E1110 16:37:21.481387 23599 instancegroups.go:388] Cluster did not validate within 15m0s
I1110 16:37:21.481393 23599 instancegroups.go:383] Validating the cluster.
I1110 16:37:23.467696 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:37:55.861082 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:38:28.627294 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:39:01.600674 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:39:34.421850 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
I1110 16:40:06.954583 23599 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": machine "i-0de5271375ff81ce1" has not yet joined cluster.
...
kops-configuration logs
-- Logs begin at Tue 2020-11-10 05:30:22 UTC, end at Tue 2020-11-10 08:38:17 UTC. --
Nov 10 08:22:48 ip-aa-bb-cc-dd systemd[1]: Starting Run kops bootstrap (nodeup)...
-- Subject: A start job for unit kops-configuration.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit kops-configuration.service has begun execution.
--
-- The job identifier is 532.
Nov 10 08:22:48 ip-aa-bb-cc-dd nodeup[1094]: nodeup version 1.18.2 (git-84495481e4)
Nov 10 08:22:48 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:22:48.890020 1094 http.go:78] Downloading "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet"
Nov 10 08:24:48 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:24:48.890432 1094 assetstore.go:203] error downloading url "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:24:48 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:24:48.891951 1094 main.go:138] got error running nodeup (will retry in 30s): error adding asset "e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d@https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:25:19 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:25:19.983102 1094 files.go:103] Hash did not match for "/var/cache/nodeup/sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d_https___storage_googleapis_com_kubernetes-release_release_v1_17_13_bin_linux_amd64_kubelet": actual=sha256:a82068288cb43cf86d958233912a320bf093b61a04eeb20549daec776fc5a947 vs expected=sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d
Nov 10 08:25:19 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:25:19.988730 1094 http.go:78] Downloading "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet"
Nov 10 08:27:20 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:27:20.270909 1094 assetstore.go:203] error downloading url "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:27:20 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:27:20.270973 1094 main.go:138] got error running nodeup (will retry in 30s): error adding asset "e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d@https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:27:50 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:27:50.448024 1094 files.go:103] Hash did not match for "/var/cache/nodeup/sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d_https___storage_googleapis_com_kubernetes-release_release_v1_17_13_bin_linux_amd64_kubelet": actual=sha256:8b4c7ca1689d5484dce846e08f296a03200c05f92e8a2de438d61da570a38d9d vs expected=sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d
Nov 10 08:27:50 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:27:50.453848 1094 http.go:78] Downloading "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet"
Nov 10 08:29:50 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:29:50.454465 1094 assetstore.go:203] error downloading url "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:29:50 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:29:50.455273 1094 main.go:138] got error running nodeup (will retry in 30s): error adding asset "e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d@https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:30:20 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:30:20.633689 1094 files.go:103] Hash did not match for "/var/cache/nodeup/sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d_https___storage_googleapis_com_kubernetes-release_release_v1_17_13_bin_linux_amd64_kubelet": actual=sha256:d0ba48d28a01ee3821db53313bd6318dcb2667f329f6d4c7cfc51a8b1b4dbd96 vs expected=sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d
Nov 10 08:30:20 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:30:20.640534 1094 http.go:78] Downloading "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet"
Nov 10 08:32:20 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:32:20.656351 1094 assetstore.go:203] error downloading url "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:32:20 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:32:20.657007 1094 main.go:138] got error running nodeup (will retry in 30s): error adding asset "e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d@https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:32:50 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:32:50.834808 1094 files.go:103] Hash did not match for "/var/cache/nodeup/sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d_https___storage_googleapis_com_kubernetes-release_release_v1_17_13_bin_linux_amd64_kubelet": actual=sha256:d71116f32aefc3047cb88144d67ae11807abb992e679cfce1c5266812f67d82f vs expected=sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d
Nov 10 08:32:50 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:32:50.840658 1094 http.go:78] Downloading "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet"
Nov 10 08:34:50 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:34:50.852161 1094 assetstore.go:203] error downloading url "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:34:50 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:34:50.852237 1094 main.go:138] got error running nodeup (will retry in 30s): error adding asset "e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d@https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:35:21 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:35:21.025586 1094 files.go:103] Hash did not match for "/var/cache/nodeup/sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d_https___storage_googleapis_com_kubernetes-release_release_v1_17_13_bin_linux_amd64_kubelet": actual=sha256:e0835088000e8d677a33eb825d2343960f602c83de1eeb5639ef99786f1b931e vs expected=sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d
Nov 10 08:35:21 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:35:21.031075 1094 http.go:78] Downloading "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet"
Nov 10 08:37:21 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:37:21.046954 1094 assetstore.go:203] error downloading url "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:37:21 ip-aa-bb-cc-dd nodeup[1094]: W1110 08:37:21.047019 1094 main.go:138] got error running nodeup (will retry in 30s): error adding asset "e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d@https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": error downloading HTTP content from "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet": net/http: request canceled (Client.Timeout exceeded while reading body)
Nov 10 08:37:51 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:37:51.226316 1094 files.go:103] Hash did not match for "/var/cache/nodeup/sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d_https___storage_googleapis_com_kubernetes-release_release_v1_17_13_bin_linux_amd64_kubelet": actual=sha256:02b9b011a59df81232d038791e227c3961c37787d878b7b46f4a06f2bb088efe vs expected=sha256:e71c3ce50f93abc2735ba601781355d86a49aec992e8cb235a369254c304fa7d
Nov 10 08:37:51 ip-aa-bb-cc-dd nodeup[1094]: I1110 08:37:51.232073 1094 http.go:78] Downloading "https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet"
...
6. What did you expect to happen? The new node will join the cluster in few minutes.
7. Please provide your cluster manifest
kops cluster & ig manifests
---
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
name: my-cluster-name.k8s.local
spec:
additionalNetworkCIDRs:
- my-network-cidr
api:
loadBalancer:
type: Internal
authorization:
rbac: {}
channel: stable
cloudProvider: aws
configBase: s3://my-k8s-state/my-cluster-name.k8s.local
etcdClusters:
- etcdMembers:
- encryptedVolume: true
instanceGroup: master-a1
name: master-a1
name: main
- etcdMembers:
- encryptedVolume: true
instanceGroup: master-a1
name: master-a1
name: events
externalPolicies:
master:
- arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM
node:
- arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM
iam:
allowContainerRegistry: true
legacy: false
kubeAPIServer:
disableBasicAuth: true
eventTTL: 3h0m0s
kubeControllerManager:
horizontalPodAutoscalerUseRestClients: true
kubeDNS:
coreDNSImage: k8s.gcr.io/coredns:1.7.0
externalCoreFile: |-
# forward AWS records to Amazon DNS server within VPC
amazonaws.com:53 {
errors
prometheus :9153
forward . 169.254.169.253
cache 30
}
# forward external records to public DNS servers
# public DNS providers: Google > Cloudflare > Quad9 (IBM)
com net org io {
errors
prometheus :9153
forward . 8.8.8.8 1.1.1.1 9.9.9.9 {
policy sequential
}
cache 30
}
.:53 {
log . "* - {type} {class} {name} {proto} {size} {rcode} {rsize} {duration} > {remote}"
errors
health
autopath @kubernetes
kubernetes cluster.local. in-addr.arpa ip6.arpa {
pods verified
endpoint_pod_names
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
loop
cache 30
loadbalance
reload
ready
}
nodeLocalDNS:
enabled: true
provider: CoreDNS
kubelet:
anonymousAuth: false
authenticationTokenWebhook: true
authorizationMode: Webhook
enforceNodeAllocatable: pods
kubeReserved:
cpu: 100m
ephemeral-storage: 1Gi
memory: 200Mi
systemReserved:
cpu: 100m
ephemeral-storage: 1Gi
memory: 200Mi
kubernetesApiAccess:
- 10.0.0.0/8
kubernetesVersion: 1.17.13
networkCIDR: my-network-cidr
networkID: my-vpc-id
networking:
calico:
crossSubnet: true
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- 10.0.0.0/8
subnets:
- id: subnet-000001
name: subnet-a1
type: Private
zone: us-west-2a
- id: subnet-000002
name: subnet-b1
type: Private
zone: us-west-2b
- id: subnet-000003
name: subnet-c1
type: Private
zone: us-west-2c
topology:
dns:
type: Public
masters: private
nodes: private
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: my-cluster-name.k8s.local
name: master-a1
spec:
associatePublicIp: false
cloudLabels:
k8s.io/cluster-autoscaler/disabled: ""
k8s.io/cluster-autoscaler/my-cluster-name.k8s.local: owned
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20201014
machineType: t3a.large
maxPrice: "0.04"
maxSize: 1
minSize: 1
mixedInstancesPolicy:
instances:
- t3a.large
- t3.large
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: capacity-optimized
nodeLabels:
kops.k8s.io/instancegroup: master-a1
role: Master
rootVolumeOptimization: true
subnets:
- subnet-a1
suspendProcesses:
- AZRebalance
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: my-cluster-name.k8s.local
name: nodes-a1
spec:
associatePublicIp: false
cloudLabels:
k8s.io/cluster-autoscaler/enabled: ""
k8s.io/cluster-autoscaler/my-cluster-name.k8s.local: owned
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20201014
machineType: t3a.large
maxPrice: "0.04"
maxSize: 3
minSize: 1
mixedInstancesPolicy:
instances:
- t3a.large
- t3.large
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: capacity-optimized
nodeLabels:
kops.k8s.io/instancegroup: nodes-a1
role: Node
rootVolumeOptimization: true
subnets:
- subnet-a1
suspendProcesses:
- AZRebalance
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: my-cluster-name.k8s.local
name: nodes-b1
spec:
associatePublicIp: false
cloudLabels:
k8s.io/cluster-autoscaler/enabled: ""
k8s.io/cluster-autoscaler/my-cluster-name.k8s.local: owned
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20201014
machineType: t3a.large
maxPrice: "0.04"
maxSize: 3
minSize: 1
mixedInstancesPolicy:
instances:
- t3a.large
- t3.large
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: capacity-optimized
nodeLabels:
kops.k8s.io/instancegroup: nodes-b1
role: Node
rootVolumeOptimization: true
subnets:
- subnet-b1
suspendProcesses:
- AZRebalance
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: my-cluster-name.k8s.local
name: nodes-c1
spec:
associatePublicIp: false
cloudLabels:
k8s.io/cluster-autoscaler/enabled: ""
k8s.io/cluster-autoscaler/my-cluster-name.k8s.local: owned
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20201014
machineType: t3a.large
maxPrice: "0.04"
maxSize: 3
minSize: 1
mixedInstancesPolicy:
instances:
- t3a.large
- t3.large
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: capacity-optimized
nodeLabels:
kops.k8s.io/instancegroup: nodes-c1
role: Node
rootVolumeOptimization: true
subnets:
- subnet-c1
suspendProcesses:
- AZRebalance
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
I thinks it might be a network performance issue in Ubuntu 20.04 on AWS, or a nodeup issue, not kops command line tool related.
9. Anything else do we need to know?
Not have this issue when using official Debian Stretch AMI kope.io/k8s-1.17-debian-stretch-amd64-hvm-ebs-2020-07-20.
UPDATE
Just ssh to that node and wget the file that nodeup trying to download, and it spends almost 5 minutes to download.
$ wget https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet
--2020-11-10 09:19:23-- https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.27.144, 172.217.160.112, 216.58.200.48, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.27.144|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 111712216 (107M) [application/octet-stream]
Saving to: ‘kubelet’
kubelet 100%[=============================================>] 106.54M 386KB/s in 4m 50s
Other nodes (not rolling updated yet, using Debian Stretch AMI) within the same VPC don’t have this issue:
$ wget https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet
--2020-11-10 09:29:15-- https://storage.googleapis.com/kubernetes-release/release/v1.17.13/bin/linux/amd64/kubelet
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.24.16, 216.58.200.48, 172.217.160.80, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.24.16|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 111712216 (107M) [application/octet-stream]
Saving to: ‘kubelet’
kubelet 100%[=============================================>] 106.54M 29.5MB/s in 4.6s
2020-11-10 09:29:21 (23.3 MB/s) - ‘kubelet’ saved [111712216/111712216]
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 22 (15 by maintainers)
Commits related to this issue
- Set the tcp_rmem sysctl in bootstrap script This ensures that we're using our settings for downloading nodeup itself and any assets that nodeup downloads. This is a workaround for reported problems ... — committed to justinsb/kops by justinsb 3 years ago
- Set the tcp_rmem sysctl in bootstrap script This ensures that we're using our settings for downloading nodeup itself and any assets that nodeup downloads. This is a workaround for reported problems ... — committed to hakman/kops by justinsb 3 years ago
- Set the tcp_rmem sysctl in bootstrap script This ensures that we're using our settings for downloading nodeup itself and any assets that nodeup downloads. This is a workaround for reported problems ... — committed to DOboznyi/kops by justinsb 3 years ago
Although of course I’ve realized that these TCP settings are applied by nodeup, so don’t apply to the download of nodeup itself. I don’t think we have a dependency configured so that we download only after applying the sysctls, so I would guess it won’t apply to the download of kubelet etc.
A potential workaround could therefore be to explicitly set
net.ipv4.tcp_rmemin the bootstrap script, and then let nodeup set it again anyway.Would still appreciate any guidance on whether the values we’re setting make sense, but this feels like a fairly easy and safe fix!