kubeadm: kubeadm should not re-use bind-address for api-server while using experimental-control-plane

Is this a BUG REPORT or FEATURE REQUEST?

BUG

  • Fill in as much of the template below as you can. If you leave out information, we can’t help you as well.

i am trying to set up HA and after the apiVersion of the yaml that gets passed to kubeadm went to v1beta1 (from v1alpha2), i’ve had to adjust and use the --experimental-control-plane flag… well it’s sure experimental enough, i uncovered this issue where the first node comes up fine, but when joining the 2nd node using the new flag, it creates a kube-apiserver yaml manifest that has the same bind-address parameter as the first api-server, which results in seemingly unrelated errors in the log file:

F0111 13:57:18.372186       1 controller.go:147] Unable to perform initial IP allocation check: unable to refresh the service IP block: Get https://10.12.100.131:6443/api/v1/services: x509: certificate signed by unknown authority

manually editing /etc/kubernetes/manifests/kube-apiserver.yaml with an in-place sed on the relevant bind-address line fixes it and allows the control plane to come up successfully…

so this seems like a bit of a bug in the way kubeadm sets up the manifest of kube-apiserver on the non-primary node, in that it re-uses the bind-address from the first node

p.s… i’m specifying a bind-address in my yaml because i don’t want it to bind on 0.0.0.0 (all interfaces), since i’m running a nginx on a VIP (setup by keepalived) on each node, so i want to specifically have it only bind to the a certain IP.

Versions

kubeadm version (use kubeadm version): v1.13.2 kubeadm version: &version.Info{Major:“1”, Minor:“13”, GitVersion:“v1.13.2”, GitCommit:“cff46ab41ff0bb44d8584413b598ad8360ec1def”, GitTreeState:“clean”, BuildDate:“2019-01-10T23:33:30Z”, GoVersion:“go1.11.4”, Compiler:“gc”, Platform:“linux/amd64”} Environment:

  • Kubernetes version: v1.13.2
  • Cloud provider or hardware configuration: centos 7 kvm on centos 7 baremetal hypervisor
  • OS (e.g. from /etc/os-release): centos 7
  • Kernel (e.g. uname -a): 3.10.0-957.1.3.el7.x86_64 (latest centos 7.6 updates applied)
  • Others: software load balancer self-hosted on master nodes

What happened?

api server on 2nd and 3rd node goes into a crash-backup-loop after getting repeated errors due to certificate errors?

What you expected to happen?

2nd (and 3rd) node joins cluster with a functioning API server

How to reproduce it (as minimally and precisely as possible)?

use the following config on node 1 and use kubeadm init as per the official docs

apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
apiServer:
  certSANs:
  - api-k8s-lab # hostname of load-balancer entry
  extraArgs:
    bind-address: 10.12.100.131 # ip of first node only
controllerManager:
  extraArgs:
    address: 0.0.0.0
controlPlaneEndpoint: api-k8s-lab:6443
networking:
  serviceSubnet: 10.12.12.0/23
scheduler:
  extraArgs:
    address: 0.0.0.0
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs

Anything else we need to know?

am using keepalived on each master node, to keep a VIP on one of the master nodes, and also have nginx on each master node, listening on the VIP… both these are in docker containers

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 2
  • Comments: 24 (12 by maintainers)

Most upvoted comments

Overlooked about internal traffic and other mechanisms used to service it… re the external traffic, to me, if we have to work around my 0.0.0.0:6443 already-in-use condition (by changing port-numbers) it seems more preferable to change the bind-port for the API servers itself. That way at least the well-known port # of 6443 is maintained (at least externally)…

So would something like this:

localAPIEndpoint:
  bindPort: 16443

accomplish this ? That is versus instead advertising the API-service on a different (instead of the well-known 6443) port.

For discussion’s sake, is customizing the bind-address something that is perhaps undesirable? i.e. by virtue of it (by default) being 0.0.0.0, would mean that it would also listen on 127.0.0.1… however that would technically also listen on every other interface’s address (including container-owned transient interfaces) if the API-server were to re-start (after other containers started up)… that to me seems like it could lead to problems down the line, the impact of which are debatable i suppose? Or does it (somehow) help that the API server listens on the loopback interface? Internally the kernel knows (from the routing table) that to reach its own address it internally uses the loopback anyway, but is anything expecting/benefiting from the API server listening on the loopback interface as well?


In any case, while we’re discussing the API-server, are any of you able to offer general best practices ideas for typical bare-metal deployments of 3x API-servers in an HA config? Specifically when being not cloud-based (and we have no hw LB available on-prem); since to recap, I’ve turned to using nginx to load-balance between the 3 API-servers, with nginx listening only on a VIP floating between the 3 API-servers (managed by keepalived), and asking the API server to instead of 0.0.0.0 to have a bind-address of the host’s main interface’s IP only.

This work-around worked fine with k8s <= 1.11, back when kubeadm was using had a MasterConfig v1apha3 object, since when bootstrapping the cluster, i had created a separate kubeadm.yaml on each master host (thus letting me customize the bind-address on a per-API-server basis). But now, as of (i think) v1.12, the yaml apiVersion went to v1beta1 (with support for earlier versions seeming like they were withdrawn for bootstrapping newer versions of k8s), so there’s now only one ClusterConfig, for the whole cluster. And thus I’ve uncovered when having specified a bind-address on a specific IP, this was actually set on each API-server the same, causing the 2nd and 3rd to not start (it was listening on an IP that wasn’t available).

This is the reason for this issue being open in the first case as kubeadm doesn’t currently handle this edge-case properly. I thought this was a bug in the way kubeadm was behaving but you guys explained this might be how it’s designed…

So I’m now re-evaluating whether my initial approach (nginx/keepalived) even really follows best-practices… Personally, i don’t really like the idea of having to alter any port numbers to work around the already-in-use issue, so wondering what other ideas people might have ?

You’ve got me now considering eliminating nginx from this entirely (but keep using keepalived to move the VIP as needed), however to consider the impact, this would result in all requests would be directed at only one API-server… From what I’ve read, API-servers can act in active/active, so this side-effect of eliminating nginx (meaning without any kind of load-balancing in front of them), a single API-server would be taking all the load… which doesn’t seem ideal either.

What are other people with on-prem clusters doing?