cert-manager: Issue certificate using dns01 via route53 stuck on SelfCheck status

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened: Trying to configure dns01 route53 provider and it works using staging letsencrypt ClusterIssuer. When changing ClusterIssuer to live, it stuck on status

  Normal  PrepareCertificate  15m   cert-manager-controller  Preparing certificate with issuer
  Normal  PresentChallenge    15m   cert-manager-controller  Presenting dns-01 challenge for domain auth-service-trunk.gel.net
  Normal  PresentChallenge    15m   cert-manager-controller  Presenting dns-01 challenge for domain auth.test.gel.tech
  Normal  SelfCheck           14m   cert-manager-controller  Performing self-check for domain auth-service-trunk.gel.net
  Normal  SelfCheck           14m   cert-manager-controller  Performing self-check for domain auth.test.geo.tech

I’ve checked route 53 and I see there _acme-challenge. TXT records for both domains. Same live ClusterIssuer works as expected with http01 provider. In log no errors. How to understand what wrong? Can I enable debug mode in some way?

What you expected to happen: Successfully issued certificate

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): Server Version: version.Info{Major:“1”, Minor:“8”, GitVersion:“v1.8.8”, GitCommit:“2f73858c9e6ede659d6828fe5a1862a48034a0fd”, GitTreeState:“clean”, BuildDate:“2018-02-09T21:23:25Z”, GoVersion:“go1.8.3”, Compiler:“gc”, Platform:“linux/amd64”}

  • Cloud provider or hardware configuration**: AWS

  • Install tools: Installed via helm from https://github.com/kubernetes/charts/tree/master/stable/cert-manager

  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 4
  • Comments: 21 (7 by maintainers)

Most upvoted comments

I had/have the same issue with http01 it seams, where the cert-manager cant curl the well-known url from inside the cluster, but externaly it is possible to access. Just the edge router missing some loopback … spend 15h for this knowledge.

The error description “error waiting for key to be available for domain” is very missleading. Should be “error waiting for collange-key resource to be available for domain …” or http01: “error in resolve of collaenge resource “http://domain.tt/.well-known/…” to be available for domain …” dns01: “error in resolve of collaenge resource “_acme-challenge.domain.tt” to be available for domain …”

‘context deadline exceeded’ means the HTTP request timed out. With the range of stuff you have reported my guess would be you’ve having some connectivity issues with Internet requests from your cluster. It looks like your cert-manager is fine, but it has, perhaps intermittent, trouble making DNS requests and making HTTP requests to the Internet to check the challenges are in place.