kops: kops ami might have a problem with k8s 1.8 (kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-02-08)
https://kubernetes.slack.com/archives/C3QUFP0QM/p1521739991000074
Summary: Updating the ami to kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-02-08 on the kubernetes 1.8 masters and node seems to cause them to fail the AWS EC2 Instance reachability check and not become healthy. Aws restarts them repeatedly.
- What
kopsversion are you running? The commandkops version, will display this information.Version 1.8.1 (git-94ef202) - What Kubernetes version are you running?
kubectl versionwill print the version if a cluster is running or provide the Kubernetes version specified as akopsflag.
--- kubernetes/kops ‹master› » kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.10", GitCommit:"044cd262c40234014f01b40ed7b9d09adbafe9b1", GitTreeState:"clean", BuildDate:"2018-03-19T17:51:28Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.10", GitCommit:"044cd262c40234014f01b40ed7b9d09adbafe9b1", GitTreeState:"clean", BuildDate:"2018-03-19T17:44:09Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
- What cloud provider are you using?
aws - What commands did you run? What is the simplest way to reproduce this issue?
kops rolling update cluster --yes - What happened after the commands executed?
Instance reachability check fails and instances is restarted many times. This happened when just changing the ami (kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14 -> kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-02-08) , and also when changing ami and k8s version from 1.8.8 to 1.8.10
- What did you expect to happen?
master to come back up - Please provide your cluster manifest. Execute
kops get --name my.example.com -oyamlto display your cluster manifest. You may want to remove your cluster name and other sensitive information.
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
creationTimestamp: 2017-02-09T23:47:31Z
name: <redacted>
spec:
additionalPolicies:
node: |
[
{
"Effect": "Allow",
"Action": ["ec2:AttachVolume"],
"Resource": ["*"]
},
{
"Effect": "Allow",
"Action": ["ec2:DetachVolume"],
"Resource": ["*"]
}
]
api:
dns: {}
authorization:
rbac: {}
channel: stable
cloudProvider: aws
configBase: <redacted>
etcdClusters:
- etcdMembers:
- instanceGroup: master-us-east-1a
name: a
- instanceGroup: master-us-east-1c
name: c
- instanceGroup: master-us-east-1d
name: d
name: main
- etcdMembers:
- instanceGroup: master-us-east-1a
name: a
- instanceGroup: master-us-east-1c
name: c
- instanceGroup: master-us-east-1d
name: d
name: events
iam:
legacy: false
kubernetesApiAccess:
- <redacted>
kubernetesVersion: 1.8.10
masterInternalName: <redacted>
masterPublicName: <redacted>
networkCIDR: 10.101.0.0/16
networking:
kubenet: {}
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- <redacted>
subnets:
- cidr: 10.101.32.0/19
name: us-east-1a
type: Public
zone: us-east-1a
- cidr: 10.101.64.0/19
name: us-east-1c
type: Public
zone: us-east-1c
- cidr: 10.101.96.0/19
name: us-east-1d
type: Public
zone: us-east-1d
- cidr: 10.101.128.0/19
name: us-east-1e
type: Public
zone: us-east-1e
topology:
dns:
type: Public
masters: public
nodes: public
-
Please run the commands with most verbose logging by adding the
-v 10flag. Paste the logs into this report, or in a gist and provide the gist link here. -
Anything else do we need to know?
------------- FEATURE REQUEST TEMPLATE --------------------
-
Describe IN DETAIL the feature/behavior/change you would like to see.
-
Feel free to provide a design supporting your feature request.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 1
- Comments: 18 (11 by maintainers)
I ran into this problem as well. I have the problem when using m3.large, but not when using m3.medium.
I see the following crash when I look at the instance system log in AWS: https://gist.github.com/wendorf/91f5a2c77c3cdc277e48c2c22fc0b46b
We were facing the same issue in r3.large types. The issue gets fixed on upgrading the kernel in the image.
Image:
kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-02-08Error in ec2 system log
Below steps fixes the issue and instance status check passes after that.
@dmcnaught @chrislovecnm @justinsb The issue seems critical and should be fixed soon, as it is the latest version of public recommended AMI which gets by default in every k8s installation using kops. Would love to fix this.