kubernetes: purell module is no longer maintained

What happened:

This is a proactive issue opened to explore the path forward with the purell module that we have been using as a dependency for kubeadm in here https://github.com/kubernetes/kubernetes/blob/d6b408f74890abaa0b5be7172714c7fe89ee7eff/cmd/kubeadm/app/preflight/checks.go#L714

This module is looking for a new maintainer.

What you expected to happen:

Discussions around the path forward, potentially eliminate the module from usage and do this operation in-house, as we use it in only one place for specific functionality.

How to reproduce it (as minimally and precisely as possible):

Ref Issue from purell repo: https://github.com/PuerkitoBio/purell/issues/33

Anything else we need to know?:

Slack Thread: https://kubernetes.slack.com/archives/C09R23FHP/p1624250397087400

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 15 (15 by maintainers)

Most upvoted comments

From a Cluster API perspective:

It’s highly unlikely from the CAPI side that we’ll specify IDNA in the kubeadm config. There is a chance on CAPV (vSphere) that a host picks up the DHCP option for the domain name suffix, and that it’s using an IDNA. E.g., in my home environment, i have DHCP option 15 set to home.internal.randomvariable.co.uk, and these end up in the etcd pods as parameters, e.g. - --initial-cluster=mgmt-control-plane-pb5tc.home.internal.randomvariable.co.uk=https://192.168.192.227:2380

Is IDNA encoding a consideration for input in etcd’s command line, or are unicode args fine?

For the public clouds, it’s even more unlikely we’ll hit it as the choice of domain name suffix is highly constricted to meet node identity requirements in the CPI.

My own take, is that I’m not wed to keeping this around at all, I haven’t seen a use case for it.

@neolit123 just an FYI, the maintainers of the go-openapi have also removed the purell dependency. 🚀

https://github.com/kubernetes/kubernetes/pull/103801#issuecomment-893776328

looks like we should tell go-openapi/jsonreference to stop using the library too.

@gkarthiks no worries. we cannot add it in 1.22 because it’s not a critical bug fix and we are in code freeze. i will comment on the PR.

/triage accepted i didn’t see any PRs for this for 1.22. looks like we can update it in 1.23.

Is IDNA encoding a consideration for input in etcd’s command line, or are unicode args fine?

unicode args should be fine, also apparently Go’s HTTP stack does IDNA encoding:

res, err := http.Get("https://example.台灣")
if err != nil {
	log.Fatal(err)
}
// https://example.台灣 is not a real website
// output: 2009/11/10 23:00:00 Get "https://example.%E5%8F%B0%E7%81%A3": dial tcp: lookup example.xn--kpry57d on 169.254.169.254:53: dial udp 169.254.169.254:53: connect: no route to host

i will update the proposed release note above.

but IMO, if one would like a domain to be encoded when passed to all the tools in the k8s stack, maybe it should be done in advance.

it feels like IDNA is a host reachability problem that should be solved pre-k8s deployment.

or leave it to the Go standard library to do the encoding. i saw comments that it might not be fully compliant yet, but i don’t know how up to date these comments are.

in kubeadm the usage of purell.NormalizeURLString does a number of things:

  1. applies Unicode normalization on the URL host according to “Unicode Normalization Form C”.
  2. applies Unicode transformation to “fold” characters to canonical width on the host.
  3. converts the host from Unicode to ASCII (IDNA encoding, but 1 and 2 and needed due to a non-conformant x/net/idna at the time?)
  4. removes duplicate “//” from the URL path.
  5. escapes the whole URL as per RFC 3986 (looks like PuerkitoBio/urlesc forks golang/net/url).

IMO, we should:

  • replace the usage of purell.NormalizeURLString with this trimmed local version:
func normalizeURLString(s string) (string, error) {
	u, err := url.Parse(s)
	if err != nil {
		return "", err
	}
	if len(u.Path) > 0 {
		u.Path = strings.ReplaceAll(u.Path, "//", "/")
	}
	return u.String(), nil
}

then add a release note to the PR:

kubeadm: external etcd endpoints passed in the ClusterConfiguration that have Unicode characters are no longer IDNA encoded (converted to Punycode). They are now just URL encoded as per Go’s implementation of RFC-3986, have duplicate “/” removed from the URL paths and passed like that directly to the kube-apiserver --etcd-servers flag. If you have etcd endpoints that have Unicode characters, it is advisable to encode them in advance with tooling that is fully IDNA compliant. If you don’t do that, the Go standard library (used in k8s and etcd) would do it for you when making requests to the endpoints.

we could import golang.org/x/*, and do the punycode conversion, but i do not understand why it should be kubeadm’s responsibility to do that. it feels like IDNA is a host reachability problem that should be solved pre-k8s deployment.

i can see this being a breaking change to users having Unicode etcd endpoints in kubeadm, but i’m hoping the same users were already aware to not rely on k8s components to manage this for them.