cert-manager: Propagation check failed, wrong service used by cm-acme-http-solver
Describe the bug: Propagation check failed, wrong service used by cm-acme-http-solver and ACME cannot check validity of the domain and deliver certificate
Expected behaviour: Challenge success and certificate delivered
Steps to reproduce the bug: Following https://docs.cert-manager.io/en/latest/tutorials/acme/http-validation.html , but I have multiple subdomains at the same time.
Anything else we need to know?: I move from cert-manager 0.8 to 0.11 few weeks. All works fine, with new subdomains added. Since few days, new subdomains failed to be validated.
Environment details::
- Kubernetes version : v1.13.10
- Cloud-provider : Azure AKS
- cert-manager version : v0.11.0
- Install method : helm
/kind bug
kubectl describe challenge:
Name: tls-secret-1495667673-716095195-908999738
Namespace: default
Labels: <none>
Annotations: <none>
API Version: acme.cert-manager.io/v1alpha2
Kind: Challenge
Metadata:
Creation Timestamp: 2019-11-20T10:00:11Z
Finalizers:
finalizer.acme.cert-manager.io
Generation: 1
Owner References:
API Version: cert-manager.io/v1alpha2
Block Owner Deletion: true
Controller: true
Kind: Order
Name: tls-secret-1495667673-716095195
UID: 81e087b8-0b7c-11ea-95a0-7e8b3f31c3c5
Resource Version: 10962589
Self Link: /apis/acme.cert-manager.io/v1alpha2/namespaces/default/challenges/tls-secret-1495667673-716095195-908999738
UID: 8a97cf77-0b7c-11ea-95a0-7e8b3f31c3c5
Spec:
Authz URL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/1322577887
Dns Name: domain20.convertigo.net
Issuer Ref:
Group: cert-manager.io
Kind: Issuer
Name: letsencrypt-prod
Key: BZiefIaNveMd0bwXbjywYExT6wGHdETnJLs5D6iZOAY.zIJBhOqgURIGfuNbqfatmAXt5je_GyTDV34tQ02Xqmw
Solver:
Http 01:
Ingress:
Class: nginx
Token: BZiefIaNveMd0bwXbjywYExT6wGHdETnJLs5D6iZOAY
Type: http-01
URL: https://acme-v02.api.letsencrypt.org/acme/chall-v3/1322577887/aUEJmg
Wildcard: false
Status:
Presented: true
Processing: true
Reason: Waiting for http-01 challenge propagation: wrong status code '503', expected '200'
State: pending
Events: <none>
I have a cm-acme-http-solver-wmpps Ingress with:
"spec": {
"rules": [
{
"host": "domain20.convertigo.net",
"http": {
"paths": [
{
"path": "/.well-known/acme-challenge/BZiefIaNveMd0bwXbjywYExT6wGHdETnJLs5D6iZOAY",
"backend": {
"serviceName": "cm-acme-http-solver-9mddf",
"servicePort": 8089
}
}
]
}
}
]
},
It refers a Service cm-acme-http-solver-9mddf that don’t exist but I have a cm-acme-http-solver-6c7r2. Is this normal ?
Do you need another information or do you know a work around ?
Thx !
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 5
- Comments: 29 (2 by maintainers)
I delete the Ingress of the Challenge that point a wrong service and … a new valid Ingress was created ! The certificate is now good. I let the ticket open in case of new cases for few days.
@nicolas-albert we are experiencing the exact same issue. Ingress created by cert-manager points to an acme solver service that doesn’t exist in the namespace (another one exists with a different name)
This looks like a bug.
We tend to issue multiple requests like this (for many subdomains at a time), and most times they appear to succeed, but there’s always the odd one that gets stuck like this.
@macevil I added a node so the pods could start, this fixed it for me!
How can I apply this patch now?
I am using the following command to deploy the cert-manager: helm install cert-manager
–namespace cert-manager
–version v0.12.0
jetstack/cert-manager
We are currently using the patch by temporarily editing the cert manager deployment to use https://hub.docker.com/r/oliverpowell84/cert-manager-controller/tags for the controller image @nicolas-albert
Should be fixed by https://github.com/jetstack/cert-manager/pull/2460 Waiting for the v0.13 release to test it. Unless there is a simple way to use this patch now.
I have the same problem, cert-manager v0.10.1, Helm v2.14.2
It’s really easy to reproduce when you have lots of domains on your certificate. Let’s Encrypt allows up to 100 alternate names, and some of our certs have more than 50. It’s trivial to reproduce this problem with that many domains.
@munnerz @nicolas-albert @schemen we’ve had this bug pop up again on the weekend. I spent some time digging, and I now have a decent hypothesis on what’s happening. I created a new issue to describe the bug here: https://github.com/jetstack/cert-manager/issues/2442 .
We upgraded to Helm v3 yesterday, and since then, we haven’t been able to reproduce this bug. I’m not sure if this was caused by us using a pre v3 Helm or not. @nicolas-albert did you install cert-manager with Helm, and if so, which version?
In summary. cert-manager v0.12.0 with Helm v3 appears to fix this issue, but we’ll keep you updated if we spot this happening again.
Hard to come up with a reproducible example because it doesn’t seem to happen every time, only sometimes.
I’ll do my best to give you as much alternative info as possible. Going to see if I can provoke it again, and record all the logs etc.
Heyo! I got the same issue on a K3S cluster with traefik as Ingress. Once I edited the Ingress object to point towards the correct service it resolved itself quickly. Also version 0.12
Bad news 😕 Are you using AKS too ?
0.12 seems out? https://github.com/jetstack/cert-manager/releases/tag/v0.12.0
We’ve just upgraded, so I’m holding thumbs that it will sort of this issue.