cert-manager: http-01 self check failed for domain
Is this a BUG REPORT or FEATURE REQUEST?:
Uncomment only one, leave it on its own line:
/kind bug
/kind feature
What happened: I get the message: http-01 self check failed for domain “<redacted>”
$ kubectl describe certificates website-cert
Name: website-cert
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"certmanager.k8s.io/v1alpha1","kind":"Certificate","metadata":{"annotations":{},"name":"website-cert","namespace":"default"},"spe...
API Version: certmanager.k8s.io/v1alpha1
Kind: Certificate
Metadata:
Cluster Name:
Creation Timestamp: 2018-06-14T14:56:48Z
Generation: 0
Resource Version: 14514530
Self Link: /apis/certmanager.k8s.io/v1alpha1/namespaces/default/certificates/website-cert
UID: 2a172bc7-6fe3-11e8-a23d-00163e0067a2
Spec:
Acme:
Config:
Domains:
<redacted>.com
Http 01:
Ingress: ingress
Common Name:
Dns Names:
<redacted>.com
Issuer Ref:
Name: letsencrypt-issuer-staging
Secret Name: website-cert
Status:
Acme:
Order:
Challenges:
Authz URL: https://acme-staging-v02.api.letsencrypt.org/acme/authz/d4lkE7p4egv_GNHKOGkIZeNxANPhc4icVwX6ceSfvfQ
Domain: <redacted>.com
Http 01:
Ingress: ingress
Key: VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw.UYrPMOqVi1SlKjy8hYE4t6mdtpuoNxCAANIaDzkZhw0
Token: VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
Type: http-01
URL: https://acme-staging-v02.api.letsencrypt.org/acme/challenge/d4lkE7p4egv_GNHKOGkIZeNxANPhc4icVwX6ceSfvfQ/135522965
Wildcard: false
URL: https://acme-staging-v02.api.letsencrypt.org/acme/order/6285995/2040425
Conditions:
Last Transition Time: 2018-06-14T14:56:56Z
Message: http-01 self check failed for domain "<redacted>.com"
Reason: ValidateError
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreateOrder 4s cert-manager Created new ACME order, attempting validation...
If I get all the events:
I0614 15:03:16.667525 1 controller.go:177] certificates controller: syncing item 'default/website-cert'
I0614 15:03:16.667660 1 sync.go:239] Preparing certificate default/website-cert with issuer
I0614 15:03:16.667674 1 acme.go:159] getting private key (letsencrypt-issuer-staging->tls.key) for acme issuer default/letsencrypt-issuer-staging
I0614 15:03:16.668072 1 logger.go:27] Calling GetOrder
I0614 15:03:16.876856 1 logger.go:52] Calling GetAuthorization
I0614 15:03:17.065635 1 logger.go:72] Calling HTTP01ChallengeResponse
I0614 15:03:17.065678 1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/website-cert
I0614 15:03:17.065696 1 logger.go:47] Calling GetChallenge
I0614 15:03:17.266766 1 helpers.go:162] Found status change for Certificate "website-cert" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-06-14 15:03:17.266752283 +0000 UTC m=+20046.828096097
I0614 15:03:17.266805 1 sync.go:241] Error preparing issuer for certificate default/website-cert: http-01 self check failed for domain "<redacted>.com"
E0614 15:03:17.272906 1 sync.go:168] [default/website-cert] Error getting certificate 'website-cert': secret "website-cert" not found
E0614 15:03:17.272958 1 controller.go:186] certificates controller: Re-queuing item "default/website-cert" due to error processing: http-01 self check failed for domain "<redacted>.com"
What you expected to happen: The self check to succeed
How to reproduce it (as minimally and precisely as possible): Here is my Ingress:
spec:
tls:
- hosts:
- <redacted>.com
secretName: website-cert
rules:
- host: <redacted>.com
http:
paths:
- backend:
servicePort: 80
serviceName: website
path: /
- backend:
servicePort: 8089
serviceName: cm-acme-http-solver-7lvgt
path: >-
/.well-known/acme-challenge/VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
apiVersion: extensions/v1beta1
status:
loadBalancer:
ingress:
- ip: {IP}
kind: Ingress
metadata:
uid: 6c304201-6fe2-11e8-8294-00163e020142
resourceVersion: '14515959'
name: ingress
creationTimestamp: '2018-06-14T14:51:30Z'
selfLink: /apis/extensions/v1beta1/namespaces/default/ingresses/ingress
generation: 4
namespace: default
Here is my Issuer:
apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
name: letsencrypt-issuer-staging
namespace: default
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: <redacted>
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-issuer-staging
http01: {}
Here is my certificate:
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: website-cert
spec:
secretName: website-cert
dnsNames:
- <redacted>.com
acme:
config:
- http01:
ingress: ingress
domains:
- <redacted>.com
issuerRef:
name: letsencrypt-issuer-staging
Anything else we need to know?: When I navigate to
http://<redacted>.com/.well-known/acme-challenge/VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
I get:
VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw.UYrPMOqVi1SlKjy8hYE4t6mdtpuoNxCAANIaDzkZhw0
Also, if I look at the logs of the cm-acme pod:
2018/06/14 17:31:58 [<redacted>.com] Validating request. basePath=/.well-known/acme-challenge, token=VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
2018/06/14 17:31:58 [<redacted>.com] Comparing actual host '<redacted>.com' against expected '<redacted>.com'
2018/06/14 17:31:58 [<redacted>.com] Got successful challenge request, writing key...
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-12T14:26:04Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-18T23:58:35Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration**: Aliyun Container Service
- Install tools:
- Others:
I’ve been struggling for two days. It’s probably something really stupid from my side 😃
Any idea?
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 67 (3 by maintainers)
@stopsopa So there is actually another alternative, which makes the self-checks work even with PROXY protocol enabled.
Notice that you have to explicitly write your hostname (“example.com”) in order for the kubernetes iptables issue to be worked around. Not sure how this would work if you have multiple hostnames pointing to the same loadbalancer.
Subdomains work fine though (like www.example.com, subdomain.example.com etc.)
For the record, my problem was:
I’ve been following Rancher HA setup guide which suggests having public-facing nginx load balancer. That is OK, but the problem is: their sample nginx config redirects all the HTTP traffic to HTTPS. I was having HTTPS enabled using their default self-signed certificate. That was obviously stopping let’s encrypt from reaching the challenge URL.
So, if you bump into this, make sure your traffic either allows HTTP or has HTTPS with a trusted cert.
Caveat Emptor: The Google Cloud Load Balancer Ingress is a difficult with cert-manager http01
Suggest you use dns01 challenge instead.
I don’t think you can use ingress shim
When using GCLB you MUST specify a preexisting ingress otherwise GCLB will create another load-balancer on a different IP. The self checks will fail because your loadbalancer with the correct DNS will not have the necessary rule.
The ingress will not update unless everything is perfect
Namely, if the secret does not exist already the ingress will not update the loadbalancer rules. When working with GCLB always describe the ingress first when troubleshooting. Make sure events look happy.
The process I went about from having a preexisting certificate:
Because the GCLB doesn’t change the real configuration unless everything is OK I believe you can get through a migration to cert-manager from a pre-shared cert with no impact. Especially if you follow these simplified steps:
I think @kiuka point is interesting to consider as well. The GCLB ingress asserts that
/*
points to a default back end. I manually deleted it several times from the LB, but it comes back almost instantly. I’m hoping this works despite/*
.Version Matters?
I believe I read in other issues that there’s some issues with different versions of GCLB? Don’t remember where.
Hi all, I ran into the same issue. I’ve just published
hairpin-proxy
which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxyIt uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager’s self-check to pass.
The problem solved by itself today. I don’t know how.
Thanks for cert-manager. It’s really a great tool!
I’d try
curl
-ing the challenge endpoint from within your cluster. Had a similar problem and in my case it was the missing NAT reflection (or split DNS) that prevented cert-manager inside my cluster from verifying that the challenge was available.Turns out the internal services in the cluster were not able to reach things within the cluster through external IP because I had enabled PROXY protcol in my load balancer.
When I disabled PROXY protocol, the certificates were issued almost immediately.
Which meant I could turn PROXY protocol back on:
My application requires the use of PROXY protocol in order to check the users IP addresses. Is there a way of fixing this without having to switch PROXY protcol on and off every 90 days to renew my certs?
I had this problem, I was following a tutorial that suggested to install
nginx-ingress
as well ascert-manager
usingkubectl apply -f
.I installed everything using
helm
and things worked like a charm:What a coincidence. Just today I published https://github.com/nabsul/k8s-letsencrypt with instructions on how to manually issue certificates in your Kubernetes cluster. The hope being that I’ll only need to manually issue certs a few times until this issue is fixed.
I wish I’d seen @compumike 's solution sooner!!
Hello , in fact the probleme is :
The ingress Rule generate for acme challenge in in https but with a bad certificate (because not yet generated) -> that cause the failed challenge.
The solution is to add : nginx.ingress.kubernetes.io/ssl-redirect: “false” annotation to the ingress rule cm-acme-http-solver generated.
wait a minute and the certificate will be created.
To the developers: the solution should be to add this parameter by default in the ingress rule.
JEff
I encountered the same issue. Any address I choose for my app works, except a single one whose validation is blocked by
http-01 self check failed for domain
error. In particularhttp://foo.mydomain.com
doesn’t work, but for examplehttp://foo-app.mydomain.com
works like a charm and can be validated in less than a minute.I’m trying to figure our from logs what could be a reason for this single subdomain to fail self check validation.
In my case: Error message: cert manager challenge remote error: tls: unrecognized name I added in my ingress annotations: cert-manager.io/issue-temporary-certificate: “true” acme.cert-manager.io/http01-edit-in-place: “true”
It worked.
I was able to fix this, the chain of issues started as follow:
I had the following in the annotation in my ingress controller
nginx.ingress.kubernetes.io/use-regex: “true”
nginx.ingress.kubernetes.io/rewrite-target: /
this caused all URLs to be rewritten to / this caused the cert-manager to fail on self-check before communicating to let’s encrypt this caused certificate generation not to start at all this also caused the DNS resolution from inside the cluster to fail
commenting these 2 lines made things work
@compumike Thanks so much!! 🥇
I encountered this problem and the issue ended up being due to the fact that I was the setting
loadBalancerSourceRanges
on my ingress controller.This caused the self check GET request to return a “connection timed out” error.
Removing the IP restrictions allowed the certificate to be successfully granted.
Also encountered
could not reach 'http://HOST.domain.NET/.well-known/acme-challenge/NldjKBM648vvka9A7VCSIKqqFwBCxM2DP5rIBgNr80s': wrong status code '404', expected '200'
inkubectl -n istio-system logs -f certmanager-1c1c1c1c1c1-xnxxnnxnx
After looking at all ingresses
kubectl get ingress --all-namespaces
I realized that istio had created its own ingress to intercept the.well-known/acme-challenge/
call from letsencrypt.This “letsencrypt cm-acme-http-solver” ingress is a temporary one and apparently there to intercept and answer the call to
.well-known/acme-challenge/
- its rules configuration for matching a particular backend is identical to the original ingress needed for my service, except thepaths:
section contains the very specific path matching rule; my service was initially without a path match and probably chosen as the catch all, preventing the acme challenge from resolving.Not working:
Working
(notice the very last line
path: /
)Not sure if this is just a lucky coincidence now, or if it is really needed - ymmv
Just got into this error
wrong status code '404', expected '200'
This is my config:
Found this on my nginx ingress logs:
conflicting server name "my.domain" on 0.0.0.0:80, ignored
"GET /.well-known/acme-challenge/AK94LF_RCdMq_yriPKU7IlAdxPclVzNmIAxpIfEkX-c HTTP/1.1" 404 209 "http://my.domain/.well-known/acme-challenge/AK94LF_RCdMq_yriPKU7IlAdxPclVzNmIAxpIfEkX-c" "Go-http-client/1.1" "-"
Just changed the host option on my ingress rule and the issue was fixed:
After that I had to put it back in place.
@bertoost I think it would make sense to open a separate issue and post some configuration details.
I know there is a lot of chatter on this topic and wanted to give what I was seeing as well as what fixed it.
In my case, I have had ingress successfully setup with cert-manager for two domains
mydomain.com
andwww.mydomain.com
running for awhile without an issue.I recently added another host/rule/backend
api.mydomain.com
so that myingress.yaml
looks like the followingI also saw the following in the ingress logs
Additionally, (and what led me to this thread) was the output of
kubectl describe certificate
showed there was an issue with self checkhttp-01 self check failed for domain "www.mydomain.io"
Upon trying different things, within seconds of running a command to delete the
letsencrypt-prod
seret, it was regenerated and now everything works.kubectldo delete secret letsencrypt-prod
I had a similar problem. For some reason auto-regeneration stopped working. I had
self-check
problem, etc. What helped me was deleting the old certificates (the whole secret with files) and certificate. With this cert-manager managed 😉 So having a stale certificate was a problem for uknown reason.kubectl delete secret myapp-tls
(where .pem resides)kubectl delete certificate myapp-tls
Before this I changed version from 0.4.X to 0.5.0, but the problem was immune to version change.
Just in case it’s helpful, I had a situation where the well-known path was set for both my main ingress and the one created by cert-manager. I think what happened is that the path set for my main ingress was the chosen one, and was automatically redirecting to SSL and failing because the certificate wasn’t found.
Removing the main ingress completely and recreating seemed to resolve the issue for me.
same issue here - the pod of the challenge are up running with no logs, and cert manager is failing the self check.
I manually deleted the secret for the TLS and it successfully generated the cert.
I have tested a pod with the same service account name to create and update a secret and it succeeded so its not an RBAC solution.
here’s my log:
Sorry for the delay. Glad this resolved itself.
FWIW, this is actually expected. Currently if the self check fails, we update the status information with the reason (ie self check failed) and try again later (to allow for propagation)
On Tue, 19 Jun 2018 at 01:02, Arianit Uka notifications@github.com wrote: