kubernetes: is kubernetes 1.9.2 don't support --etcd-servers-overrides crd type

[environment] kubernetes 1.9.2 –etcd-servers-overrides=isolate.harmonycloud.cn/hleases#http://10.10.101.176:2379

crd.yaml apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: hleases.isolate.harmonycloud.cn spec: group: isolate.harmonycloud.cn version: v1alpha1 names: kind: HLease plural: hleases scope: Namespaced

[expected results] hleases relate reaquest redirect 10.10.101.176:2379

[actual results] hleases relate reaquest don’t redirect 10.10.101.176:2379

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 38 (26 by maintainers)

Most upvoted comments

+1 same scenarios, want to use crds with custom etcd. As the admin of the cluster at the time of setup we want to install some of our CRDs and have them separated out to different ETCD. This will be mostly done at the time of cluster setup. Once set it will be static for the cluster as it will not change.

Thanks for the detailed report @matte21. I’ve cross posted it to https://github.com/coreos/etcd-operator/issues/2163. Let’s discuss it further there.

I used to deploy a 3-member etcd v3.3.11 cluster on a Kubernetes 1.16 IKS cluster. The etcd cluster was used to store CRs defined via API aggregation, and was managed via coreOS’s etcd operator v0.9.4.

I had successfully deployed the etcd cluster and accessed the API objects stored in it countless times on the very same IKS cluster, but one day, after deploying it, the weird behavior began. More precisely, the first member would always start up successfully, but the second member encountered an error and got stuck in failed state without being replaced. Its logs terminated with

2020-02-10 22:23:40.466759 W | etcdserver: could not get cluster response from https://the-etcd-cluster-qdwdv2ctn7.the-etcd-cluster.example-com.svc:2380: Get https://the-etcd-cluster-qdwdv2ctn7.the-etcd-cluster.example-com.svc:2380/members: EOF 2020-02-10 22:23:40.467349 C | etcdmain: cannot fetch cluster info from peer urls: could not retrieve cluster information from the given urls

where the-etcd-cluster-qdwdv2ctn7 is the first member of the etcd cluster. The third member’s pod was never created. This behavior happened around 80% of the times. In the remaining 20% the cluster got up and running and operated correctly; the 1st and second members of the cluster were up and running immediately, while the third member failed to start, got replaced (too quickly for me to intercept its logs) and then started correctly. Both the 2nd and 3rd member logs contained messages like:

2020-02-11 14:11:03.413970 I | embed: rejected connection from “172.30.171.153:47446” (error “tls: "172.30.171.153" does not match any of DNSNames [".the-etcd-cluster.example-com.svc" ".the-etcd-cluster.example-com.svc.cluster.local"]”, ServerName “the-etcd-cluster-nqkqzxqgzt.the-etcd-cluster.example-com^C I | embed: rejected connection from “172.30.171.153:47458” (error “tls: "172.30.171.153" does not match any of DNSNames [".the-etcd-cluster.example-com.svc" ".the-etcd-cluster.example-com.svc.cluster.local"]”, ServerName “the-etcd-cluster-nqkqzxqgzt.the-etcd-cluster.example-com.svc”, IPAddresses [], DNSNames [”.the-etcd-cluster.example-com.svc" ".the-etcd-cluster.example-com.svc.cluster.local"])

I observed the issue only when TLS was enabled, I did many trials w/o TLS and the cluster started and operated perfectly fine. I also tried many deployments with TLS on but different etcd versions. I observed the issue with v3.2.0 and v3.4.3, never with v3.1.15, v3.1.16 and v3.1.20.

After some experiments without solving the issue I deleted my IKS cluster and got a new one with Kubernetes 1.15. I’ve never seen the issue again since then.

For completeness: manifest of the etcd cluster: https://github.com/MikeSpreitzer/kube-examples/blob/add-kos/staging/kos/deploy/etcd-cluster/48-etcd.yaml manifest of the etcd operator: https://github.com/MikeSpreitzer/kube-examples/blob/add-kos/staging/kos/deploy/etcd-operator/47-eo-theop.yaml

Etcd 3.2 and higher seem to be unreliable regarding startup — there is flakiness in the TLS session establishment.

Do we have an issue open for this already?

See also https://github.com/kubernetes/kubernetes/pull/82580#issuecomment-535499216

Static cli options (–etcd-servers-overrids) that pair with dynamic api-specified options (dynamic custom resource API types) are pretty strange, so it’s not clear this is the right way to accomplish parceling out custom resources to different etcd instances.