kubeadm: kubeadm upgrade fails when hostname != node name and when kubeadm config is used

This is a followup for the https://github.com/kubernetes/kubeadm/issues/1757

What keywords did you search in kubeadm issues before filing this one?

upgrade kubeadm hostname

If you have found any duplicates, you should instead reply there and close this page.

If you have not found any duplicates, delete this section and continue on.

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): kubeadm version: &version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-18T14:34:01Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: bare metal
  • OS (e.g. from /etc/os-release): ubuntu:bionic
  • Kernel (e.g. uname -a): Linux hq-srv11 4.15.0-64-generic #73-Ubuntu SMP Thu Sep 12 13:16:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Others:

What happened?

During the kubeadm upgrade apply v1.16.0 --config=/path/to/config.yaml --dry-run run it ended up with an infinite loop of

[dryrun] Resource name: "hq-srv11"
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.

and the same with more verbose output:

I0924 20:42:22.238048   16521 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "hq-srv11" as an annotation
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"
I0924 20:42:22.738465   16521 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json" -H "User-Agent: kubeadm/v1.16.0 (linux/amd64) kubernetes/2bd9643" 'https://10.50.8.1:6443/api/v1/nodes/hq-srv11'
I0924 20:42:22.742984   16521 round_trippers.go:443] GET https://10.50.8.1:6443/api/v1/nodes/hq-srv11 404 Not Found in 4 milliseconds
I0924 20:42:22.743053   16521 round_trippers.go:449] Response Headers:
I0924 20:42:22.743110   16521 round_trippers.go:452]     Content-Type: application/json
I0924 20:42:22.743126   16521 round_trippers.go:452]     Content-Length: 186
I0924 20:42:22.743142   16521 round_trippers.go:452]     Date: Tue, 24 Sep 2019 20:42:22 GMT
I0924 20:42:22.743183   16521 request.go:968] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nodes \"hq-srv11\" not found","reason":"NotFound","details":{"name":"hq-srv11","kind":"nodes"},"code":404}
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"

I traced back to see where that value comes from and found the source of the problem:

func SetJoinDynamicDefaults(cfg *kubeadmapi.JoinConfiguration) error {
	addControlPlaneTaint := false
	if cfg.ControlPlane != nil {
		addControlPlaneTaint = true
	}
	if err := SetNodeRegistrationDynamicDefaults(&cfg.NodeRegistration, addControlPlaneTaint); err != nil {
		return err
	}

	return SetJoinControlPlaneDefaults(cfg.ControlPlane)
}
// SetNodeRegistrationDynamicDefaults checks and sets configuration values for the NodeRegistration object
func SetNodeRegistrationDynamicDefaults(cfg *kubeadmapi.NodeRegistrationOptions, ControlPlaneTaint bool) error {
	var err error
	cfg.Name, err = kubeadmutil.GetHostname(cfg.Name)
	if err != nil {
		return err
	}
// GetHostname returns OS's hostname if 'hostnameOverride' is empty; otherwise, return 'hostnameOverride'
// NOTE: This function copied from pkg/util/node package to avoid external kubeadm dependency
func GetHostname(hostnameOverride string) (string, error) {
	hostName := hostnameOverride
	if len(hostName) == 0 {
		nodeName, err := os.Hostname()
		if err != nil {
			return "", errors.Wrap(err, "couldn't determine hostname")
		}
		hostName = nodeName
	}

As you can see unless you specify it explicitly - the os.Hostname is used, and the hostname of the machine is hq-srv11:

# hostname
hq-srv11
# hostname -f
hq-srv11.<redacted-org-domain-name>

while nodes in the cluster have the explicitly set FQDN

# kubectl get nodes
NAME                                      STATUS   ROLES    AGE   VERSION
hq-srv11.<redacted-org-domain-name>          Ready    master   63d   v1.15.3

What you expected to happen?

I believe the name of the node should be obtained from the API, or at least correlated with what’s in the API, since hostname not necessary matches the node name.

How to reproduce it (as minimally and precisely as possible)?

Initialise an older version cluster with a node with non-default name and with a kubeadm confid using kubeadm init --node-name=foo, then upgrade, using the kubeadm config again.

Anything else we need to know?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 25 (20 by maintainers)

Most upvoted comments

nowhere in the upgrade documentation process it states whether it should be specified or not

exactly, that is because we don’t want users to use it.

the --config flag was added to upgrade to allow reconfiguration of the existing cluster, which is now supported using the kubeadm kustomize feature (see the changelog for 1.16). yet, reconfiguring the cluster using this flag is not recommended.

If it’s not the case I think it may need a bit of clarification in the documentation, right?

i agree. this needs a line or two in this document: https://github.com/kubernetes/website/blob/master/content/en/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade.md

/kind documentation /assign

ok, so i did some investigation here.

@zerkms your suggestion to use certificates to fetch the node name is already used actually, but only when the user is not providing a config file and and the configuration is fetched from the cluster. see https://github.com/kubernetes/kubernetes/blob/2e6b073a3f800654ec217e763fcb97412308a9db/cmd/kubeadm/app/util/config/cluster.go#L113

this is like so because the dynamic defaulting of node name from certficates happens only for nodes that have the kubelet config and certificates present already and a configuration is fetched from the cluster. if you pass a configuration file kubeadm will default the node name to your hostname. this is by design.

dynamically defaulting your node name to a value from the kubelet and certificates when already passing --config to apply is an option, but i don’t think we should do this.

the explicit flag that @SataQiu added is workaround for your use case. there is a similar flag for CRI socket. but i’m personally not in favor of adding more flags.

your existing workaround is to have such a config:

apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
nodeRegistration:
  name: the.fqdn.here
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
....

my question for you @zerkms is why are you passing --config to apply? this acts like reconfiguration and while kubeadm supports it, it should not be done in the first place. if your config is missing important information it will be defaulted with dynamic values, such as the host name of the node.

we need a way to automatically associate node and node name configured within the cluster.

isn’t it available in the kubelet certificate?

Yeah, that might be one way. We can extract the host name from the certificate. But I’m not quite sure. @zerkms @neolit123

Thanks @zerkms I have reproduced the problem through the following steps:

# kubeadm init --node-name=foo
# kubeadm config view > config.yaml
# kubeadm upgrade apply v1.16.0  --dry-run --config=config.yaml

I’m going to dig into how do we solve this problem.

i will try to reproduce this again tomorrow.