kops: using the KOPS latest version (1.27) i am unable to validate the production grade cluster. this is a bug why this is behaving like this. If i use old version i am able to validate & deploy the production grade cluster

/kind bug

1. What kops version are you running? The command kops version, will display this information. “1.27” Latest version

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. “1.27.30” something kubectl version

3. What cloud provider are you using? “ÄWS Cloud”

4. What commands did you run? What is the simplest way to reproduce this issue? KOPS Validate cluster 10m

after create the cluster i will run the KOPS validate cluster 10m

after 10min it will show like below

Validation Failed W0801 23:42:48.805760 1677 validate_cluster.go:232] (will retry): cluster not yet healthy INSTANCE GROUPS NAME ROLE MACHINETYPE MIN MAX SUBNETS control-plane-us-east-1a ControlPlane t3.medium 1 1 us-east-1a nodes-us-east-1a Node t3.medium 1 1 us-east-1a nodes-us-east-1b Node t3.medium 1 1 us-east-1b nodes-us-east-1c Node t3.medium 1 1 us-east-1c

NODE STATUS NAME ROLE READY

VALIDATION ERRORS KIND NAME MESSAGE dns apiserver Validation Failed

The dns-controller Kubernetes deployment has not updated the Kubernetes cluster’s API DNS entry to the correct IP address. The API DNS IP address is the placeholder address that kops creates: 203.0.113.123. Please wait about 5-10 minutes for a control plane node to start, dns-controller to launch, and DNS to propagate. The protokube container and dns-controller deployment logs may contain more diagnostic information. Etcd and the API DNS entries must be updated for a kops Kubernetes cluster to start.

Validation Failed W0801 23:42:58.808700 1677 validate_cluster.go:232] (will retry): cluster not yet healthy Error: validation failed: wait time exceeded during validation

You may want to remove your cluster name and other sensitive information.**

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (4 by maintainers)

Most upvoted comments

Figure out why etcd-manager isn’t starting up etcd.

OK, thanks.

I did some aimless log surfing and turned up this in /var/log/syslog:

Aug  2 18:50:14 i-006e5d87da353b3ff containerd[2780]: time="2023-08-02T18:50:14.775473195Z" level=error msg="PullImage \"registry.k8s.io/etcd:3.5.4-0@sha256:6f7b851544986cb0921b53ea655ec04c36131248f16d4ad110cb3ca0c369dc1\" failed" error="failed to pull and unpack image \"registry.k8s.io/etcd@sha256:6f72b851544986cb0921b53ea655ec04c36131248f16d4ad110cb3ca0c369dc1\": failed to copy: write /var/lib/containerd/io.containerd.content.v1.content/ingest/1458b08dd70e854901bd0359e2ec1cd64ed09072381617d237558b9f67584438/data: no space left on device"

Sure enough, the control plane was out of disk space. I recreated the cluster using 12 instead of 8 for --master-volume-size, and now I’m able to run kops validate cluster successfully.

@Neelamnaidu, maybe you need to increase the amount of disk space available on the control plane? It seems like kops 1.27.0 requires more than previous versions.

Yes thank you. i will try after increase the diskspace .