cert-manager: http-01 self check failed for domain

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened: I get the message: http-01 self check failed for domain “<redacted>”

$ kubectl describe certificates website-cert

Name:         website-cert
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"certmanager.k8s.io/v1alpha1","kind":"Certificate","metadata":{"annotations":{},"name":"website-cert","namespace":"default"},"spe...
API Version:  certmanager.k8s.io/v1alpha1
Kind:         Certificate
Metadata:
  Cluster Name:
  Creation Timestamp:  2018-06-14T14:56:48Z
  Generation:          0
  Resource Version:    14514530
  Self Link:           /apis/certmanager.k8s.io/v1alpha1/namespaces/default/certificates/website-cert
  UID:                 2a172bc7-6fe3-11e8-a23d-00163e0067a2
Spec:
  Acme:
    Config:
      Domains:
        <redacted>.com
      Http 01:
        Ingress:  ingress
  Common Name:
  Dns Names:
    <redacted>.com
  Issuer Ref:
    Name:       letsencrypt-issuer-staging
  Secret Name:  website-cert
Status:
  Acme:
    Order:
      Challenges:
        Authz URL:  https://acme-staging-v02.api.letsencrypt.org/acme/authz/d4lkE7p4egv_GNHKOGkIZeNxANPhc4icVwX6ceSfvfQ
        Domain:     <redacted>.com
        Http 01:
          Ingress:  ingress
        Key:        VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw.UYrPMOqVi1SlKjy8hYE4t6mdtpuoNxCAANIaDzkZhw0
        Token:      VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
        Type:       http-01
        URL:        https://acme-staging-v02.api.letsencrypt.org/acme/challenge/d4lkE7p4egv_GNHKOGkIZeNxANPhc4icVwX6ceSfvfQ/135522965
        Wildcard:   false
      URL:          https://acme-staging-v02.api.letsencrypt.org/acme/order/6285995/2040425
  Conditions:
    Last Transition Time:  2018-06-14T14:56:56Z
    Message:               http-01 self check failed for domain "<redacted>.com"
    Reason:                ValidateError
    Status:                False
    Type:                  Ready
Events:
  Type    Reason       Age   From          Message
  ----    ------       ----  ----          -------
  Normal  CreateOrder  4s    cert-manager  Created new ACME order, attempting validation...

If I get all the events:

I0614 15:03:16.667525       1 controller.go:177] certificates controller: syncing item 'default/website-cert'
I0614 15:03:16.667660       1 sync.go:239] Preparing certificate default/website-cert with issuer
I0614 15:03:16.667674       1 acme.go:159] getting private key (letsencrypt-issuer-staging->tls.key) for acme issuer default/letsencrypt-issuer-staging
I0614 15:03:16.668072       1 logger.go:27] Calling GetOrder
I0614 15:03:16.876856       1 logger.go:52] Calling GetAuthorization
I0614 15:03:17.065635       1 logger.go:72] Calling HTTP01ChallengeResponse
I0614 15:03:17.065678       1 prepare.go:263] Cleaning up old/expired challenges for Certificate default/website-cert
I0614 15:03:17.065696       1 logger.go:47] Calling GetChallenge
I0614 15:03:17.266766       1 helpers.go:162] Found status change for Certificate "website-cert" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-06-14 15:03:17.266752283 +0000 UTC m=+20046.828096097
I0614 15:03:17.266805       1 sync.go:241] Error preparing issuer for certificate default/website-cert: http-01 self check failed for domain "<redacted>.com"
E0614 15:03:17.272906       1 sync.go:168] [default/website-cert] Error getting certificate 'website-cert': secret "website-cert" not found
E0614 15:03:17.272958       1 controller.go:186] certificates controller: Re-queuing item "default/website-cert" due to error processing: http-01 self check failed for domain "<redacted>.com"

What you expected to happen: The self check to succeed

How to reproduce it (as minimally and precisely as possible): Here is my Ingress:

spec:
  tls:
    - hosts:
        - <redacted>.com
      secretName: website-cert
  rules:
    - host: <redacted>.com
      http:
        paths:
          - backend:
              servicePort: 80
              serviceName: website
            path: /
          - backend:
              servicePort: 8089
              serviceName: cm-acme-http-solver-7lvgt
            path: >-
              /.well-known/acme-challenge/VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
apiVersion: extensions/v1beta1
status:
  loadBalancer:
    ingress:
      - ip: {IP}
kind: Ingress
metadata:
  uid: 6c304201-6fe2-11e8-8294-00163e020142
  resourceVersion: '14515959'
  name: ingress
  creationTimestamp: '2018-06-14T14:51:30Z'
  selfLink: /apis/extensions/v1beta1/namespaces/default/ingresses/ingress
  generation: 4
  namespace: default

Here is my Issuer:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
  name: letsencrypt-issuer-staging
  namespace: default
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: <redacted>

    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-issuer-staging
    http01: {}

Here is my certificate:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: website-cert
spec:
  secretName: website-cert
  dnsNames:
  - <redacted>.com
  acme:
    config:
    - http01:
        ingress: ingress
      domains:
      - <redacted>.com
  issuerRef:
    name: letsencrypt-issuer-staging

Anything else we need to know?: When I navigate to

http://<redacted>.com/.well-known/acme-challenge/VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw

I get:

VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw.UYrPMOqVi1SlKjy8hYE4t6mdtpuoNxCAANIaDzkZhw0

Also, if I look at the logs of the cm-acme pod:

2018/06/14 17:31:58 [<redacted>.com] Validating request. basePath=/.well-known/acme-challenge, token=VPf6GKhjZO3CZ4VNjlv6yjg4_7W38X5FZ78pXVJ56Bw
2018/06/14 17:31:58 [<redacted>.com] Comparing actual host '<redacted>.com' against expected '<redacted>.com'
2018/06/14 17:31:58 [<redacted>.com] Got successful challenge request, writing key...

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-12T14:26:04Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-18T23:58:35Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration**: Aliyun Container Service
  • Install tools:
  • Others:

I’ve been struggling for two days. It’s probably something really stupid from my side 😃

Any idea?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 67 (3 by maintainers)

Most upvoted comments

@stopsopa So there is actually another alternative, which makes the self-checks work even with PROXY protocol enabled.

kubectl patch -ngitlab-managed-apps service/ingress-nginx-ingress-controller -p '{"metadata":{"annotations":{"service.beta.kubernetes.io/do-loadbalancer-hostname":"example.com"}}}'

Notice that you have to explicitly write your hostname (“example.com”) in order for the kubernetes iptables issue to be worked around. Not sure how this would work if you have multiple hostnames pointing to the same loadbalancer.

Subdomains work fine though (like www.example.com, subdomain.example.com etc.)

For the record, my problem was:

I’ve been following Rancher HA setup guide which suggests having public-facing nginx load balancer. That is OK, but the problem is: their sample nginx config redirects all the HTTP traffic to HTTPS. I was having HTTPS enabled using their default self-signed certificate. That was obviously stopping let’s encrypt from reaching the challenge URL.

So, if you bump into this, make sure your traffic either allows HTTP or has HTTPS with a trusted cert.

Caveat Emptor: The Google Cloud Load Balancer Ingress is a difficult with cert-manager http01

Suggest you use dns01 challenge instead.

I don’t think you can use ingress shim

When using GCLB you MUST specify a preexisting ingress otherwise GCLB will create another load-balancer on a different IP. The self checks will fail because your loadbalancer with the correct DNS will not have the necessary rule.

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
#...
spec:
  acme:
    config:
    - domains:
      - my.cool.domain.com
      http01:
        ingress: already-existing-ingress-resource-name

The ingress will not update unless everything is perfect

Namely, if the secret does not exist already the ingress will not update the loadbalancer rules. When working with GCLB always describe the ingress first when troubleshooting. Make sure events look happy.

The process I went about from having a preexisting certificate:

  1. Create a Certificate CRD as per cert-manager documentation
    • make sure you specify the preexisting ingress as above!
  2. Do not change anything about the ingress.
  3. Watch the ingress definition change! cert-manager adds a path to a service it creates in the cluster that hosts the file to serve. Try curling it.
  4. Find the load balancer in GCLB, does it match expectations? If not describe the ingress and remove errors until it has the challenge path listed as a backend!
  5. After the certificate is issued add the TLS secret to the ingress manually. Verify that GCLB has updated properly with a crazy random certificate name.

Because the GCLB doesn’t change the real configuration unless everything is OK I believe you can get through a migration to cert-manager from a pre-shared cert with no impact. Especially if you follow these simplified steps:

  1. Create a Certificate CRD as per cert-manager documentation
    • make sure you specify the preexisting ingress as above!
  2. Add TLS secret manually to ingress

I think @kiuka point is interesting to consider as well. The GCLB ingress asserts that /* points to a default back end. I manually deleted it several times from the LB, but it comes back almost instantly. I’m hoping this works despite /*.

Version Matters?

I believe I read in other issues that there’s some issues with different versions of GCLB? Don’t remember where.

Hi all, I ran into the same issue. I’ve just published hairpin-proxy which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxy

It uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager’s self-check to pass.

The problem solved by itself today. I don’t know how.

Thanks for cert-manager. It’s really a great tool!

I’d try curl-ing the challenge endpoint from within your cluster. Had a similar problem and in my case it was the missing NAT reflection (or split DNS) that prevented cert-manager inside my cluster from verifying that the challenge was available.

Turns out the internal services in the cluster were not able to reach things within the cluster through external IP because I had enabled PROXY protcol in my load balancer.

When I disabled PROXY protocol, the certificates were issued almost immediately.

kubectl patch -ngitlab-managed-apps service/ingress-nginx-ingress-controller -p '{"metadata":{"annotations":{"service.beta.kubernetes.io/do-loadbalancer-enable-proxy-protocol":"false"}}}'
kubectl patch -ngitlab-managed-apps configmap/ingress-nginx-ingress-controller -p '{"data":{"use-proxy-protocol":"false"}}'

Which meant I could turn PROXY protocol back on:

kubectl patch -ngitlab-managed-apps service/ingress-nginx-ingress-controller -p '{"metadata":{"annotations":{"service.beta.kubernetes.io/do-loadbalancer-enable-proxy-protocol":"true"}}}'
kubectl patch -ngitlab-managed-apps configmap/ingress-nginx-ingress-controller -p '{"data":{"use-proxy-protocol":"true"}}'

My application requires the use of PROXY protocol in order to check the users IP addresses. Is there a way of fixing this without having to switch PROXY protcol on and off every 90 days to renew my certs?

I had this problem, I was following a tutorial that suggested to install nginx-ingress as well as cert-manager using kubectl apply -f .

I installed everything using helm and things worked like a charm:

helm install my-nginx-ingress stable/nginx-ingress
helm repo add jetstack https://charts.jetstack.io
helm repo update 
helm install  cert-manager jetstack/cert-manager  --namespace cert-manager --version v0.15.0 --set installCRDs=true

What a coincidence. Just today I published https://github.com/nabsul/k8s-letsencrypt with instructions on how to manually issue certificates in your Kubernetes cluster. The hope being that I’ll only need to manually issue certs a few times until this issue is fixed.

I wish I’d seen @compumike 's solution sooner!!

Hello , in fact the probleme is :

The ingress Rule generate for acme challenge in in https but with a bad certificate (because not yet generated) -> that cause the failed challenge.

The solution is to add : nginx.ingress.kubernetes.io/ssl-redirect: “false” annotation to the ingress rule cm-acme-http-solver generated.

wait a minute and the certificate will be created.

To the developers: the solution should be to add this parameter by default in the ingress rule.

JEff

I encountered the same issue. Any address I choose for my app works, except a single one whose validation is blocked by http-01 self check failed for domain error. In particular http://foo.mydomain.com doesn’t work, but for example http://foo-app.mydomain.com works like a charm and can be validated in less than a minute.

I’m trying to figure our from logs what could be a reason for this single subdomain to fail self check validation.

In my case: Error message: cert manager challenge remote error: tls: unrecognized name I added in my ingress annotations: cert-manager.io/issue-temporary-certificate: “true” acme.cert-manager.io/http01-edit-in-place: “true”

It worked.

I was able to fix this, the chain of issues started as follow:

I had the following in the annotation in my ingress controller

nginx.ingress.kubernetes.io/use-regex: “true”

nginx.ingress.kubernetes.io/rewrite-target: /

this caused all URLs to be rewritten to / this caused the cert-manager to fail on self-check before communicating to let’s encrypt this caused certificate generation not to start at all this also caused the DNS resolution from inside the cluster to fail

commenting these 2 lines made things work

@compumike Thanks so much!! 🥇

I encountered this problem and the issue ended up being due to the fact that I was the setting loadBalancerSourceRanges on my ingress controller.

This caused the self check GET request to return a “connection timed out” error.

Removing the IP restrictions allowed the certificate to be successfully granted.

Also encountered could not reach 'http://HOST.domain.NET/.well-known/acme-challenge/NldjKBM648vvka9A7VCSIKqqFwBCxM2DP5rIBgNr80s': wrong status code '404', expected '200' in kubectl -n istio-system logs -f certmanager-1c1c1c1c1c1-xnxxnnxnx

After looking at all ingresses kubectl get ingress --all-namespaces I realized that istio had created its own ingress to intercept the .well-known/acme-challenge/ call from letsencrypt.

This “letsencrypt cm-acme-http-solver” ingress is a temporary one and apparently there to intercept and answer the call to .well-known/acme-challenge/ - its rules configuration for matching a particular backend is identical to the original ingress needed for my service, except the paths: section contains the very specific path matching rule; my service was initially without a path match and probably chosen as the catch all, preventing the acme challenge from resolving.

Not working:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: istio
  name: my-dashboard-ingress
  namespace: frontend
spec:
  rules:
    - host: "host.domain.com"
      http:
        paths:
          - backend:
              serviceName: dashboard
              servicePort: 80

Working


apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: istio
  name: my-dashboard-ingress
  namespace: frontend
spec:
  rules:
    - host: "host.domain.com"
      http:
        paths:
          - backend:
              serviceName: dashboard
              servicePort: 80
            path: /

(notice the very last line path: / )

Not sure if this is just a lucky coincidence now, or if it is really needed - ymmv

Just got into this error

wrong status code '404', expected '200'

This is my config:

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: api
  namespace: production
  annotations:
    kubernetes.io/ingress.class: "nginx"
    certmanager.k8s.io/issuer: "letsencrypt-prod"
    certmanager.k8s.io/acme-challenge-type: http01
spec:
  tls:
  - hosts:
    - my.domain
    secretName: api-tls
  rules:
  - host: my.domain
    http:
      paths:
      - path: /
        backend:
          serviceName: api
          servicePort: 3000

Found this on my nginx ingress logs:

conflicting server name "my.domain" on 0.0.0.0:80, ignored

"GET /.well-known/acme-challenge/AK94LF_RCdMq_yriPKU7IlAdxPclVzNmIAxpIfEkX-c HTTP/1.1" 404 209 "http://my.domain/.well-known/acme-challenge/AK94LF_RCdMq_yriPKU7IlAdxPclVzNmIAxpIfEkX-c" "Go-http-client/1.1" "-"

Just changed the host option on my ingress rule and the issue was fixed:

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: api
  namespace: production
  annotations:
    kubernetes.io/ingress.class: "nginx"
    certmanager.k8s.io/issuer: "letsencrypt-prod"
    certmanager.k8s.io/acme-challenge-type: http01
spec:
  tls:
  - hosts:
    - my.domain
    secretName: api-tls
  rules:
  - host: my2.domain
    http:
      paths:
      - path: /
        backend:
          serviceName: api
          servicePort: 3000

After that I had to put it back in place.

@bertoost I think it would make sense to open a separate issue and post some configuration details.

I know there is a lot of chatter on this topic and wanted to give what I was seeing as well as what fixed it.

In my case, I have had ingress successfully setup with cert-manager for two domains mydomain.com and www.mydomain.com running for awhile without an issue.

I recently added another host/rule/backend api.mydomain.com so that my ingress.yaml looks like the following

kind: Ingress
metadata:
  name: web
  annotations:
    kubernetes.io/ingress.class: nginx
    certmanager.k8s.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts:
        - mydomain.io
        - www.mydomain.io
        - api.mydomain.io <-- THIS IS WHAT WAS ADDED
      secretName: letsencrypt-prod
  rules:
    - host: mydomain.io
      http:
        paths:
          - backend:
              serviceName: web
              servicePort: 80
    - host: www.mydomain.io
      http:
        paths:
          - backend:
              serviceName: web
              servicePort: 80
    - host: api.mydomain.io <-- THIS IS WHAT WAS ADDED
      http:
        paths:
          - backend:
              serviceName: api
              servicePort: 80

I also saw the following in the ingress logs

W0225 03:46:38.166926       7 controller.go:1080] Validating certificate against DNS names. This will be deprecated in a future version.
W0225 03:46:38.166932       7 controller.go:1085] SSL certificate "default/letsencrypt-prod" does not contain a Common Name or Subject Alternative Name for server "api.mydomain.io": x509: certificate is valid for mydomain.io, www.mydomain.io, not api.mydomain.io

Additionally, (and what led me to this thread) was the output of kubectl describe certificate showed there was an issue with self check

http-01 self check failed for domain "www.mydomain.io"

Upon trying different things, within seconds of running a command to delete the letsencrypt-prod seret, it was regenerated and now everything works.

kubectldo delete secret letsencrypt-prod

I had a similar problem. For some reason auto-regeneration stopped working. I had self-check problem, etc. What helped me was deleting the old certificates (the whole secret with files) and certificate. With this cert-manager managed 😉 So having a stale certificate was a problem for uknown reason.

kubectl delete secret myapp-tls (where .pem resides) kubectl delete certificate myapp-tls

Before this I changed version from 0.4.X to 0.5.0, but the problem was immune to version change.

Just in case it’s helpful, I had a situation where the well-known path was set for both my main ingress and the one created by cert-manager. I think what happened is that the path set for my main ingress was the chosen one, and was automatically redirecting to SSL and failing because the certificate wasn’t found.

Removing the main ingress completely and recreating seemed to resolve the issue for me.

same issue here - the pod of the challenge are up running with no logs, and cert manager is failing the self check.

I manually deleted the secret for the TLS and it successfully generated the cert.

I have tested a pod with the same service account name to create and update a secret and it succeeded so its not an RBAC solution.

here’s my log:

sync.go:127] Certificate "web-backend-prod-tls" for ingress "backend-web-gunicorn-nginx-ingress-config" is up to date
controller.go:152] ingress-shim controller: syncing item 'backend-prod/cm-acme-http-solver-gp4h8'

logger.go:52] Calling GetChallenge

sync.go:49] Not syncing ingress backend-prod/cm-acme-http-solver-cr47k as it does not contain necessary annotations

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-cr47k"

logger.go:52] Calling GetChallenge

controller.go:152] ingress-shim controller: syncing item 'backend-prod/cm-acme-http-solver-tmz4r'

logger.go:52] Calling GetChallenge

controller.go:152] ingress-shim controller: syncing item 'backend-prod/cm-acme-http-solver-srrjm'

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-srrjm"

controller.go:195] certificates controller: Finished processing work item "backend-prod/web-backend-prod-tls"

controller.go:152] ingress-shim controller: syncing item 'backend-prod/backend-web-gunicorn-nginx-ingress-config'

service.go:35] No existing HTTP01 challenge solver service found for Certificate "backend-prod/web-backend-prod-tls". One will be created.

sync.go:124] Certificate "web-backend-prod-tls" for ingress "backend-web-gunicorn-nginx-ingress-config" already exists

helpers.go:188] Found status change for Certificate "web-backend-prod-tls" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-25 19:55:25.649155183 +0000 UTC m=+23.060496450

sync.go:174] Certificate backend-prod/web-backend-prod-tls scheduled for renewal in -728 hours

sync.go:49] Not syncing ingress backend-prod/cm-acme-http-solver-gp4h8 as it does not contain necessary annotations

ingress.go:33] Looking up Ingresses for selector certmanager.k8s.io/acme-http-domain=3600485562,certmanager.k8s.io/acme-http-token=1325141813

ingress.go:86] No existing HTTP01 challenge solver ingress found for Certificate "backend-prod/x-server-backend-prod-tls". One will be created.

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-tmz4r"

ingress.go:86] No existing HTTP01 challenge solver ingress found for Certificate "backend-prod/web-backend-prod-tls". One will be created.

sync.go:49] Not syncing ingress backend-prod/cm-acme-http-solver-srrjm as it does not contain necessary annotations

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/backend-web-gunicorn-nginx-ingress-config"

controller.go:166] ingress-shim controller: Finished processing work item "backend-prod/cm-acme-http-solver-gp4h8"

pod.go:49] No existing HTTP01 challenge solver pod found for Certificate "backend-prod/web-backend-prod-tls". One will be created.

sync.go:282] Error preparing issuer for certificate backend-prod/web-backend-prod-tls: [http-01 self check failed for domain "web.backend.server.com", http-01 self check failed for domain "web.server.com"]

Sorry for the delay. Glad this resolved itself.

FWIW, this is actually expected. Currently if the self check fails, we update the status information with the reason (ie self check failed) and try again later (to allow for propagation)

On Tue, 19 Jun 2018 at 01:02, Arianit Uka notifications@github.com wrote:

I’m running into the same thing. I see in the logs say writing key…, but if I look at the certificate, it says its still validating it.

Super buggy

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jetstack/cert-manager/issues/656#issuecomment-398232424, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMbP50YXrD8skbkWA7LyugiCup5gm9nks5t-D-qgaJpZM4UoJxb .