cert-manager: CA cert in Secret not updated when self-signed CA itself gets renewed.
Describe the bug: When using a self-signed issuer to managed an internal CA, when the CA itself gets renewed, none of the Certificates issued via the CA receive the updated CA cert, so once it expires all of the services using it fail to connect due to certificate expiration.
Expected behaviour: When a CA managed by Cert-Manager is renewed, all Certificates issued by that CA should have their Secret updated with the new CA cert.
Steps to reproduce the bug:
Apply the following to a cluster
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: selfsigned-ca
spec:
isCA: true
commonName: selfsigned-ca
secretName: selfsigned-ca
# Shortest time allowed
duration: "1h"
privateKey:
algorithm: ECDSA
size: 256
issuerRef:
name: selfsigned
kind: Issuer
group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-ca
spec:
ca:
secretName: selfsigned-ca
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: selfsigned-client
spec:
secretName: selfsigned-client
commonName: selfsigned-client
issuerRef:
name: selfsigned-ca
kind: Issuer
group: cert-manager.io
Note the CA fingerprint matches from the CA and client cert:
$ kubectl get secret selfsigned-ca -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -fingerprint -noout
SHA1 Fingerprint=7C:53:15:A9:EE:75:20:43:88:5A:5F:8C:AE:53:C0:D1:2A:77:77:A3
$ kubectl get secret selfsigned-client -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -fingerprint -noout
SHA1 Fingerprint=7C:53:15:A9:EE:75:20:43:88:5A:5F:8C:AE:53:C0:D1:2A:77:77:A3
Wait an hour for the CA certificate to be renewed…
Then check again:
$ kubectl get secret selfsigned-ca -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -fingerprint -noout
SHA1 Fingerprint=B2:48:49:D4:CC:45:F5:46:BF:B9:7D:AB:71:2C:2E:31:7E:7A:FD:59
$ kubectl get secret selfsigned-client -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -fingerprint -noout
SHA1 Fingerprint=7C:53:15:A9:EE:75:20:43:88:5A:5F:8C:AE:53:C0:D1:2A:77:77:A3
They no longer match, so the certificate bundle in the client Secret is wrong and workloads attempting to use it will see some sort of “certificate expiration” error.
Environment details::
- Kubernetes version: v1.25.5
- Cloud-provider/provisioner: N/A (bare-metal)
- cert-manager version: v1.11.0
- Install method: Helm via ArgoCD i.e.
helm template ... | kubectl apply -f -
/kind bug
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 19
- Comments: 24 (7 by maintainers)
Commits related to this issue
- Change certificates expiration from 14 to 13 days The reason for this is a bug in cert manager. The Certificates we have contain the CA that they've been signed with. When a CA has been renewed, the ... — committed to giantswarm/prometheus-rules by mnitchev a year ago
- Change certificates expiration from 14 to 13 days (#758) * Change certificates expiration from 14 to 13 days The reason for this is a bug in cert manager. The Certificates we have contain the CA ... — committed to giantswarm/prometheus-rules by mnitchev a year ago
One implementation could be adding another check to the list of policies that the trigger controller checks before determining whether a cert needs to be re-issued. This check could just check that if the Certificate is for CA issuer (assuming then) check the issuer’s CA secret and verify the the issued certificate matches the CA. We should then also update this event handler predicate as at the moment the trigger handler only runs on Certificate Secret events- the event handler predicate would be crucial to ensure that both the controller does get triggered when the CA Secret changes and does not get triggered (and run all the policy checks, some of which are expensive) when any random cluster Secret changes.
The above approach has a problem in that it would introduce more coupling between certificates controllers and the CA issuer, which may make any future refactoring difficult and does not fit the model well.
Alternative approach could be to add a new status field (i.e ‘revision’) that any issuer could implement that would signal to trigger controller that certificates need to be re-issued together with a new annotation for Certificates to signal for which ‘revision’ the Certificate was last successfully issued. With this, CA issuer could store something like CA cert fingerprint on its status and when it detects that a new CA cert has been stored in the Secret, it could bump the revision. We would add a new check to the trigger controller to trigger new issuance if the revision annotation does not match the revision field on the Certificate. That would allow us to avoid adding any issuer-specific logic to certificates controllers and any issuer, including external issuers, could implement their own logic for what would be considered a new ‘revision’ for them. I’ve not thought about this in detail, but perhaps this approach would work. The question of whether something like this can be safely switched on in GA for all CA issuers still stands (i.e whether everyone wants this feature). Perhaps the CA issuer could have some new fields to allow users to specify if the revision needs bumping or not (perhaps too hacky). In theory, the issuer could have all kinds of logic, i.e to allow users to specify how long before expiry of CA cert all certs need to be renewed, to add some skew to avoid thundering herd of renewals etc, but that would be lots of extra complexity. I’d be interested to see someone explore this approach in a design doc.
The third alternative would be to have some external component that applies Issuing condition when it detects that the CA certificate has been updated. That would need to either somehow store the fingerprint of the old certificate or have other means to distinguish between Secret updates because the CA certificate changed from other updates (i.e user added a new label). The benefit of this is that the approach could be tested out and refined before adding the code in-tree.
I’m very open to collaboration and I’d gladly review a PR, but I’m also about to go on holiday for a while and realistically I won’t personally be able to get back to doing anything until at least 2023-03-22 😦 After that, though, I’ll be working a bunch on my Kubecon talk which is related to this use case so I should be able to help a bit then!
@maelvls is currently working on open source stuff and @irbekrm is too (although Irbe is currently on holiday), so they might be able to point you in the right direction if you have questions.
I’d suggest maybe starting with a design doc PR so we can agree on what changes we might want to make. Maybe also start a discussion in #cert-manager-dev on Kubernetes slack.
I hope that’s helpful! I’ll check back in when I’m back - and thank you for showing an interest, I love this part of open source 😁
I think of it like this:
Imagine we have two different services, which we’ll call X and Y. Assume X and Y need to talk to each other using TLS and both use
ca.crtfor trust purposes.They both get their certs from an issuer we’ll call A. Since they both are issued from A, they both trust A since that’s what’s in
ca.crt.Imagine that A is about to expire so we rotate it to B. We rotate X’s certificate to be issued from B, but that means it no longer trusts Y’s certificate which is still issued from A. Also, Y doesn’t trust B until it has been rotated. This is the problem we have today - there’s always downtime since we can’t guarantee that both X and Y have their certs reissued immediately.
We could change it so that the issuer keeps track of all certs it uses, and puts all of them in
ca.crt. Now we rotate X’s certificate and it trusts Y’s cert, but we have a problem because Y still doesn’t know about B until its cert is rotated at least once.Instead, we could change it so that we “pre-issue” B, so it appears in
ca.crtbut doesn’t get used until we choose to use it. So we do that, then issue a new cert for X and Y using A, so theirca.crtnow contains bothAandB. Then we re-issue X’s cert using B, and Y will trust it. No downtime, great. At this point though I think we’ve gone to more effort than it would’ve been to just use trust-manager, not to mention the many other benefits we’d get from trust-manager in this situation.Going further, if we then realise we accidentally uploaded the private key for B on github somehow, we need to distrust B and rotate again to C.
So we need to add some way for cert-manager to know that
ca.crtshould contain A and C but not B… at this point, we’ve reinvented trust-manager but worse and harder to use, and thisca.crtlogic needs to be re-implemented in an issuer-specific way for every issuer type.Does that make sense? Sorry it’s so long!