kubernetes: CSR API: cluster signing duration is not flexible and has unsafe defaulting

Kube controller manager supports a built-in set of controllers called csrsigning. These implement the default kubernetes.io/* CSR signers and are commonly used (i.e. via kubeadm). Currently all certificates signed from these controllers have a static lifetime configured via the --cluster-signing-duration flag:

https://github.com/kubernetes/kubernetes/blob/908847c01e9640ffce2ffda5acf88d92c48a5148/cmd/kube-controller-manager/app/options/csrsigningcontroller.go#L48

When unset, this flag defaults to one year:

https://github.com/kubernetes/kubernetes/blob/908847c01e9640ffce2ffda5acf88d92c48a5148/pkg/controller/certificates/signer/config/v1alpha1/defaults.go#L44

In the Certificates KEP, we state:

Expiration/cert lifetime - minimum of CSR signer or request. Sanity of the time is the concern of the signer.

This is misleading because there is no or request functionality - it is simply the duration of the signer. This also hints that the built-in signers could be configured individually, but we do not expose any per signer config, just a single --cluster-signing-duration. However, per signer config would not be meaningful. For example, kubernetes.io/kube-apiserver-client certs issued for CN=foo and CN=bar may not have identical duration requirements.

The type of certificate (client vs. serving) may also impact what duration is considered safe. Long lived serving certs may be okay (they imply trust but not rights) but the same is not true for client certs (an identity against the Kube API that may have RBAC).

Certificates from the CSR API are unrevocable once issued (this is extremely problematic for the kubernetes.io/kube-apiserver-client signer as these are valid as identities against the Kube API server). The entire underlying certificate authority must be revoked if a single certificate needs revocation. This is generally impractical and disruptive. Issuing short lived certificates for all clients can help mitigate the problem, but not all clients are equal in regards to their capabilities to handle certificate rotation (and short durations are vulnerable to outages if the signing infrastructure is down). Nor are all clients equivalent in how long they should be trusted (ex: a client whose private key is stored in a HSM can be issued a longer certificate).

The built-in CSR signing controllers should be safe to use. Currently, if one wants to issue short lived certificates based on the client making the request, one must write a custom CSR signer. This is an unreasonable burden for when one wants the exact semantics of the built-in signers with a custom certificate lifetime.

There are three actors:

  1. The client requesting the CSR
  2. The approver
  3. The signer

We want to be able to use the built-in signers, so only the client and approver can “change.”

Option 1: from the client’s perspective, it could specify some form of maxTTL in the CSR .spec (and it would be immutable after creation like the rest of .spec). This is a bit strange because the client is not supposed to be in control of the lifetime of the certificate in standard CSR flows (hence why the CSR format has no such field). In terms of Kube’s CSR API, this may be okay because a dynamic admission webhook could easily enforce that .spec.maxTTL is appropriate for the current client (along with the rest of the CSR .spec based on any arbitrary policy). This would require one to write said dynamic webhook (or use something like OPA Gatekeeper) or otherwise intercept the client’s request and mutate it before CSR creation.

Option 2: from the approver’s perspective, they validate the CSR and if everything is valid, they approve it by setting .status.conditions[Approved] = true. In many cases (i.e. for the kubernetes.io/kube-apiserver-client signer) the approver must either be a human actor or requires custom code. Thus this seems like the most logical place to add an extra field such as .status.maxTTL which can only be set via the certificatesigningrequests/approval sub resource. This changes approval from “dear signer, I approve this CSR, give the client a signed cert for some undefined period of time” to “dear signer, I approve this CSR, give the client a signed cert that is valid for at most this length of time.” In a sense, this removes the need for the signer to have any need for a certificate lifetime duration other than some concept of a default when the approver does not state an opinion. A theoretical v2 CSR API could require that maxTTL be set to prevent the need for any default. Perhaps the KEP could be updated to:

Expiration/cert lifetime - minimum of CSR signer or request (really!). Sanity of the time is the concern of the approver.

In both API changes suggested above, enforcement of maxTTL could be handled by the API server when a certificate is set via the certificatesigningrequests/status sub resource. Thus a new API server would prevent an old controller manager from signing a certificate for longer than the specified maxTTL.

@kubernetes/sig-auth-feature-requests /kind feature /sig auth

cc @micahhausler

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 17 (14 by maintainers)

Commits related to this issue

Most upvoted comments

I think at a bare minimum that the API server could enforce through validation that the issued certificate has a lifetime no greater than the requested duration.

That wouldn’t be backwards compatible.

A way to request a TTL in spec and a way to set a recommended TTL in status when approving both seem useful to me

In cert-manager we have a duration field on our own CertificateRequest resource. Due to the fact that our equivalent of signers may ignore this, we treat it as more of an advisory/request from the user rather than a hard-and-fast rule. Other parts of the project (i.e. automated renewal) exclusively use the NotBefore, NotAfter and our own renewBefore field (the amount of time before NotAfter that we begin attempting a renewal).

When we come to adopt the CSR API, we will be bringing across a similar concept either through a field or annotation (depending on the results of this discussion).

Therefore, I’d say it’d be preferable for us to allow some form of user configurable duration request (i.e. via spec), which is effectively option (1).

I think option (2), to specify a maximum TTL may be desirable too, although I think they could be two separate things. One being the user requested duration, and another being the actual maximum.

That said, the ultimate decision/policy comes down to the signer - when integration with certain external CAs, it can be difficult/impossible to accurately predict the true maxTTL - for that reason, I think separating this into 3 distinct concepts seems the most accurate way to represent this:

  1. The user requested TTL, which is advisory and signers can choose to ignore it.
  2. The approver advised maximum TTL, which it is strongly advised that a signer honours, although again not required (often these things are out of our control)
  3. The signer mandated maximum TTL, which MAY take account of the user requested TTL and SHOULD take account of the approver advised maximum.

I certainly don’t think we should provide any guarantees around what a client should expect the duration to be based on either a spec or status field on the CSR however, and doc comments should reflect that to avoid anyone expecting so.

Naturally, we already have (3). I do see an immediate value in adding (1) (it avoids cert-manager creating its own nomen-culture for this as well). Whilst (2) sounds very desirable as a means for an ‘internal’ process to influence signer decisions, it is not something we’re currently looking at right now, as this sort of thing is typically configured in the signer (although it would be interesting to be able to specify maximums on a per-CSR basis).