cert-manager: HTTP01 challenge fails, solver pod short-lived and all its traces wiped out
Describe the bug: The certificate is not being issued, with order and challenge being in a bad state
➜ accounting-service git:(xxxxxxxxx) ✗ kubectl get challenges aws-platform-staging-eu-west-1
NAME STATE DOMAIN AGE
accounting-service-tls-lccsr-2824239067-3765429214 expired accounting-service-public.stag.aws.worksome.net 52m
➜ accounting-service git:(xxxxxxxx) ✗ kubectl get orders aws-platform-staging-eu-west-1
NAME STATE AGE
accounting-service-tls-lccsr-2824239067 invalid 52m
➜ accounting-service git:(xxxxxxxxxx) ✗ kubectl describe order accounting-service-tls-lccsr-2824239067 aws-platform-staging-eu-west-1
Name: accounting-service-tls-lccsr-2824239067
Namespace: default
Labels: <none>
Annotations: cert-manager.io/certificate-name: accounting-service-tls
cert-manager.io/certificate-revision: 1
cert-manager.io/private-key-secret-name: accounting-service-tls-ff7pk
API Version: acme.cert-manager.io/v1
Kind: Order
Metadata:
Creation Timestamp: 2022-01-11T13:32:39Z
Generation: 1
Managed Fields:
API Version: acme.cert-manager.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:cert-manager.io/certificate-name:
f:cert-manager.io/certificate-revision:
f:cert-manager.io/private-key-secret-name:
f:ownerReferences:
.:
k:{"uid":"c6e54947-8ce7-48c5-92c8-8ec7331d1273"}:
.:
f:apiVersion:
f:blockOwnerDeletion:
f:controller:
f:kind:
f:name:
f:uid:
f:spec:
.:
f:dnsNames:
f:issuerRef:
.:
f:group:
f:kind:
f:name:
f:request:
f:status:
.:
f:authorizations:
f:failureTime:
f:finalizeURL:
f:state:
f:url:
Manager: controller
Operation: Update
Time: 2022-01-11T13:34:25Z
Owner References:
API Version: cert-manager.io/v1
Block Owner Deletion: true
Controller: true
Kind: CertificateRequest
Name: accounting-service-tls-lccsr
UID: c6e54947-8ce7-48c5-92c8-8ec7331d1273
Resource Version: 17805529
UID: 71323017-b594-4020-ae74-de30a4f607d4
Spec:
Dns Names:
accounting-service-public.stag.aws.worksome.net
Issuer Ref:
Group: cert-manager.io
Kind: Issuer
Name: letsencrypt-staging
Request: <THE CERTIFICATE REQUEST BASE64-ENCODED GOES HERE>
Status:
Authorizations:
Challenges:
Token: HDU7Jy0sqG4bgp1ADI6nACKYYVs0g_5cVfdUsXcVOgg
Type: http-01
URL: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/1401516058/Oc1iHA
Token: HDU7Jy0sqG4bgp1ADI6nACKYYVs0g_5cVfdUsXcVOgg
Type: dns-01
URL: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/1401516058/Ax0QhQ
Token: HDU7Jy0sqG4bgp1ADI6nACKYYVs0g_5cVfdUsXcVOgg
Type: tls-alpn-01
URL: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/1401516058/qtKlag
Identifier: accounting-service-public.stag.aws.worksome.net
Initial State: pending
URL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/1401516058
Wildcard: false
Failure Time: 2022-01-11T13:34:25Z
Finalize URL: https://acme-staging-v02.api.letsencrypt.org/acme/finalize/39574498/1503494108
State: invalid
URL: https://acme-staging-v02.api.letsencrypt.org/acme/order/39574498/1503494108
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 53m cert-manager Created Challenge resource "accounting-service-tls-lccsr-2824239067-3765429214" for domain "accounting-service-public.stag.aws.worksome.net"
➜ accounting-service git:(xxxxxxxxxxxxxx) ✗ kubectl describe challenge accounting-service-tls-lccsr-2824239067-3765429214 aws-platform-staging-eu-west-1
Name: accounting-service-tls-lccsr-2824239067-3765429214
Namespace: default
Labels: <none>
Annotations: <none>
API Version: acme.cert-manager.io/v1
Kind: Challenge
Metadata:
Creation Timestamp: 2022-01-11T13:32:41Z
Finalizers:
finalizer.acme.cert-manager.io
Generation: 1
Managed Fields:
API Version: acme.cert-manager.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"finalizer.acme.cert-manager.io":
f:ownerReferences:
.:
k:{"uid":"71323017-b594-4020-ae74-de30a4f607d4"}:
.:
f:apiVersion:
f:blockOwnerDeletion:
f:controller:
f:kind:
f:name:
f:uid:
f:spec:
.:
f:authorizationURL:
f:dnsName:
f:issuerRef:
.:
f:group:
f:kind:
f:name:
f:key:
f:solver:
.:
f:http01:
.:
f:ingress:
.:
f:class:
f:token:
f:type:
f:url:
f:wildcard:
f:status:
.:
f:presented:
f:processing:
f:reason:
f:state:
Manager: controller
Operation: Update
Time: 2022-01-11T13:32:44Z
Owner References:
API Version: acme.cert-manager.io/v1
Block Owner Deletion: true
Controller: true
Kind: Order
Name: accounting-service-tls-lccsr-2824239067
UID: 71323017-b594-4020-ae74-de30a4f607d4
Resource Version: 17805528
UID: 4b30bb88-72b4-4049-8ac4-a2ebc7d2d9fa
Spec:
Authorization URL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/1401516058
Dns Name: accounting-service-public.stag.aws.worksome.net
Issuer Ref:
Group: cert-manager.io
Kind: Issuer
Name: letsencrypt-staging
Key: HDU7Jy0sqG4bgp1ADI6nACKYYVs0g_5cVfdUsXcVOgg.-yuHWS8lDxJT_DIqqUoRVEU3PIZV3RT-ln1_oYBMf0A
Solver:
http01:
Ingress:
Class: nginx
Token: HDU7Jy0sqG4bgp1ADI6nACKYYVs0g_5cVfdUsXcVOgg
Type: HTTP-01
URL: https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/1401516058/Oc1iHA
Wildcard: false
Status:
Presented: false
Processing: false
Reason: Error accepting challenge: 400 urn:ietf:params:acme:error:malformed: Unable to update challenge :: authorization must be pending
State: expired
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Started 55m cert-manager Challenge scheduled for processing
Normal Presented 55m cert-manager Presented challenge using HTTP-01 challenge mechanism
The cert-manager controller pod created “cm-acme-http-solver” pod+service+ingress but they might not work as expected - it seems to return 503 for a minute, then it seems to get into the “Error accepting challenge: 400” error above and the controller seems to remove the setup. The logs from a loop of the cert-manager controller:
I0111 11:32:43.082104 1 pod.go:71] cert-manager/controller/challenges/http01/ensurePod "msg"="creating HTTP01 challenge solver pod" "dnsName"="accounting-service-public.stag.aws.worksome.net" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:32:43.344491 1 pod.go:59] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-ng88s" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:32:43.344598 1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-shxmm" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:32:43.344653 1 ingress.go:90] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-4wrgx" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0111 11:32:48.404069 1 sync.go:186] cert-manager/controller/challenges "msg"="propagation check failed" "error"="wrong status code '503', expected '200'" "dnsName"="accounting-service-public.stag.aws.worksome.net" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:32:48.427112 1 pod.go:59] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-ng88s" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:32:48.427186 1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-shxmm" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:32:48.427279 1 ingress.go:90] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-4wrgx" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0111 11:32:48.440863 1 sync.go:186] cert-manager/controller/challenges "msg"="propagation check failed" "error"="wrong status code '503', expected '200'" "dnsName"="accounting-service-public.stag.aws.worksome.net" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:32:58.404531 1 pod.go:59] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-ng88s" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:32:58.404635 1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-shxmm" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:32:58.404689 1 ingress.go:90] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-4wrgx" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0111 11:32:58.423602 1 sync.go:186] cert-manager/controller/challenges "msg"="propagation check failed" "error"="wrong status code '503', expected '200'" "dnsName"="accounting-service-public.stag.aws.worksome.net" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:08.423918 1 pod.go:59] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-ng88s" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:08.424030 1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-shxmm" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:08.424086 1 ingress.go:90] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-4wrgx" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0111 11:33:08.433494 1 sync.go:186] cert-manager/controller/challenges "msg"="propagation check failed" "error"="wrong status code '503', expected '200'" "dnsName"="accounting-service-public.stag.aws.worksome.net" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:18.434851 1 pod.go:59] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-ng88s" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:18.434940 1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-shxmm" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:18.435001 1 ingress.go:90] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-4wrgx" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0111 11:33:23.444981 1 sync.go:186] cert-manager/controller/challenges "msg"="propagation check failed" "error"="wrong status code '503', expected '200'" "dnsName"="accounting-service-public.stag.aws.worksome.net" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:33.445392 1 pod.go:59] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-ng88s" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:33.445478 1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-shxmm" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:33.445536 1 ingress.go:90] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-4wrgx" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0111 11:33:38.452673 1 sync.go:186] cert-manager/controller/challenges "msg"="propagation check failed" "error"="wrong status code '503', expected '200'" "dnsName"="accounting-service-public.stag.aws.worksome.net" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:48.453060 1 pod.go:59] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-ng88s" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:48.453144 1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-shxmm" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:33:48.453209 1 ingress.go:90] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-4wrgx" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0111 11:33:53.462773 1 sync.go:186] cert-manager/controller/challenges "msg"="propagation check failed" "error"="wrong status code '503', expected '200'" "dnsName"="accounting-service-public.stag.aws.worksome.net" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:34:03.463584 1 pod.go:59] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-ng88s" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:34:03.463669 1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-shxmm" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:34:03.463734 1 ingress.go:90] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-4wrgx" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0111 11:34:39.004532 1 sync.go:386] cert-manager/controller/challenges/acceptChallenge "msg"="error waiting for authorization" "error"="context deadline exceeded" "dnsName"="accounting-service-public.stag.aws.worksome.net" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0111 11:34:39.004807 1 controller.go:163] cert-manager/controller/challenges "msg"="re-queuing item due to error processing" "error"="context deadline exceeded" "key"="default/accounting-service-tls-2bnht-2824239067-1373156968"
I0111 11:34:44.006058 1 pod.go:59] cert-manager/controller/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-ng88s" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:34:44.006177 1 service.go:43] cert-manager/controller/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-shxmm" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:34:44.006238 1 ingress.go:90] cert-manager/controller/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-4wrgx" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0111 11:34:59.233881 1 sync.go:378] cert-manager/controller/challenges/acceptChallenge "msg"="error accepting challenge" "error"="400 urn:ietf:params:acme:error:malformed: Unable to update challenge :: authorization must be pending" "dnsName"="accounting-service-public.stag.aws.worksome.net" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:34:59.321416 1 pod.go:119] cert-manager/controller/challenges/cleanupPods "msg"="deleting pod resource" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-ng88s" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0111 11:34:59.335495 1 pod.go:127] cert-manager/controller/challenges/cleanupPods "msg"="successfully deleted pod resource" "dnsName"="accounting-service-public.stag.aws.worksome.net" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-ng88s" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="accounting-service-tls-2bnht-2824239067-1373156968" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0111 11:34:59.440884 1 controller.go:102] ingress 'default/cm-acme-http-solver-4wrgx' in work queue no longer exists
I0111 11:34:59.727133 1 trigger_controller.go:160] cert-manager/controller/certificates-trigger "msg"="Not re-issuing certificate as an attempt has been made in the last hour" "key"="default/accounting-service-tls" "retry_delay"=3459272891786
E0111 11:35:00.030063 1 sync.go:70] cert-manager/controller/orders "msg"="failed to update status" "error"=null "resource_kind"="Order" "resource_name"="accounting-service-tls-2bnht-2824239067" "resource_namespace"="default" "resource_version"="v1"
I0111 11:35:00.030144 1 controller.go:161] cert-manager/controller/orders "msg"="re-queuing item due to optimistic locking on resource" "key"="default/accounting-service-tls-2bnht-2824239067" "error"="Operation cannot be fulfilled on orders.acme.cert-manager.io \"accounting-service-tls-2bnht-2824239067\": the object has been modified; please apply your changes to the latest version and try again"
I managed to quickly describe the pod-service-ingress that the cert-manager controller creates, sometime during their short lifespan:
➜ ~ kubectl describe pod cm-acme-http-solver-jnm6f; kubectl describe ingress cm-acme-http-solver-2j9fm; kubectl describe service cm-acme-http-solver-8vzb7
Name: cm-acme-http-solver-jnm6f
Namespace: default
Priority: 2000001000
Priority Class Name: system-node-critical
Node: <none>
Labels: acme.cert-manager.io/http-domain=3159757414
acme.cert-manager.io/http-token=607653218
acme.cert-manager.io/http01-solver=true
eks.amazonaws.com/fargate-profile=platform-staging-fargate-pod-profile
Annotations: CapacityProvisioned: 0.25vCPU 0.5GB
Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
kubernetes.io/psp: eks.privileged
sidecar.istio.io/inject: false
Status: Pending
IP:
IPs: <none>
Controlled By: Challenge/accounting-service-tls-hprbm-2824239067-3707414063
NominatedNodeName: 1c983fda60-0ef78df90139477ab992d594d1188b1b
Containers:
acmesolver:
Image: quay.io/jetstack/cert-manager-acmesolver:v1.6.1
Port: 8089/TCP
Host Port: 0/TCP
Args:
--listen-port=8089
--domain=accounting-service-public.stag.aws.worksome.net
--token=K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4
--key=K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4.-yuHWS8lDxJT_DIqqUoRVEU3PIZV3RT-ln1_oYBMf0A
Limits:
cpu: 100m
memory: 64Mi
Requests:
cpu: 10m
memory: 64Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rbjd6 (ro)
Volumes:
kube-api-access-rbjd6:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled 63s fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
Name: cm-acme-http-solver-2j9fm
Labels: acme.cert-manager.io/http-domain=3159757414
acme.cert-manager.io/http-token=607653218
acme.cert-manager.io/http01-solver=true
Namespace: default
Address: k8s-default-ingressn-5f8fde044d-bcdcaea9f98c3614.elb.eu-west-1.amazonaws.com
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
Host Path Backends
---- ---- --------
accounting-service-public.stag.aws.worksome.net
/.well-known/acme-challenge/K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4 cm-acme-http-solver-8vzb7:8089 (<none>)
Annotations: nginx.ingress.kubernetes.io/whitelist-source-range: 0.0.0.0/0,::/0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Sync 50s (x2 over 64s) nginx-ingress-controller Scheduled for sync
Name: cm-acme-http-solver-8vzb7
Namespace: default
Labels: acme.cert-manager.io/http-domain=3159757414
acme.cert-manager.io/http-token=607653218
acme.cert-manager.io/http01-solver=true
Annotations: auth.istio.io/8089: NONE
Selector: acme.cert-manager.io/http-domain=3159757414,acme.cert-manager.io/http-token=607653218,acme.cert-manager.io/http01-solver=true
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.200.245.42
IPs: 10.200.245.42
Port: http 8089/TCP
TargetPort: 8089/TCP
NodePort: http 30593/TCP
Endpoints: <none>
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
And I also managed to get some of the logs of this spawned solver pod in its short lifespan:
Logs from acmesolver in cm-acme-http-solver-jnm6f
I0111 12:33:58.754880 1 solver.go:39] cert-manager/acmesolver "msg"="starting listener" "expected_domain"="accounting-service-public.stag.aws.worksome.net" "expected_key"="K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4.-yuHWS8lDxJT_DIqqUoRVEU3PIZV3RT-ln1_oYBMf0A" "expected_token"="K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4" "listen_port"=8089
I0111 12:34:09.444659 1 solver.go:64] cert-manager/acmesolver "msg"="validating request" "base_path"="/.well-known/acme-challenge" "host"="accounting-service-public.stag.aws.worksome.net" "path"="/.well-known/acme-challenge/K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4" "token"="K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4"
I0111 12:34:09.444701 1 solver.go:72] cert-manager/acmesolver "msg"="comparing host" "base_path"="/.well-known/acme-challenge" "host"="accounting-service-public.stag.aws.worksome.net" "path"="/.well-known/acme-challenge/K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4" "token"="K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4" "expected_host"="accounting-service-public.stag.aws.worksome.net"
I0111 12:34:09.444734 1 solver.go:79] cert-manager/acmesolver "msg"="comparing token" "base_path"="/.well-known/acme-challenge" "host"="accounting-service-public.stag.aws.worksome.net" "path"="/.well-known/acme-challenge/K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4" "token"="K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4" "expected_token"="K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4"
I0111 12:34:09.444753 1 solver.go:87] cert-manager/acmesolver "msg"="got successful challenge request, writing key" "base_path"="/.well-known/acme-challenge" "host"="accounting-service-public.stag.aws.worksome.net" "path"="/.well-known/acme-challenge/K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4" "token"="K6SJKwJBSPDYjW5MI4ICRuHMpBN8lDwU1NLhOX2chN4"
Issuer used:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt-staging
spec:
acme:
# The ACME server URL
server: https://acme-staging-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: myemail@mycompany.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-staging.issuer.private-key
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx
Expected behaviour: challenge is successful, resulting in “valid” order state and certificate being issued
Steps to reproduce the bug:
- create an EKS cluster and set it up to work with Fargate
- set-up external-dns, ingress-nginx, aws-load-balancer-controller
- deploy a sample application served behind the ingress-nginx, with a HOST record configured in e.g. Route53 via external-dns
- check that it can be reached via http
curl http://$HOST
- deploy Issuers: letsencrypt HTTP01 issuers for both staging and production
- amend Ingress, add
cert-manager.io/issuer: "letsencrypt-staging"
annotation to the corresponding service Ingress - (expect certificate issuance and the endpoint responding successfully over https)
Anything else we need to know?:
- I tried also setting
acme.cert-manager.io/http01-ingress-class: nginx
andacme.cert-manager.io/http01-edit-in-place: "false"
annotations on the service ingress, but to no avail. - I also went through the troubleshooting info - https://cert-manager.io/docs/faq/acme/, but no luck.
curl http://$HOST
feels slow at times, I wonder if it’s because of the DNS (and I therefore wonder if it somehow impacts the cert-manager solver or the overall process here)- This is a detailed bug report of my input in the comments of https://github.com/jetstack/cert-manager/issues/4709
Environment details::
- Kubernetes version: v1.21.2-eks-06eac09, or more specifically
➜ cert-manager git:(xxxxxxxxxxxxx) ✗ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:33:37Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:20:15Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.23) and server (1.21) exceeds the supported minor version skew of +/-1
- Cloud-provider/provisioner: AWS / eks with fargate
- cert-manager version: 1.6.1
- Install method: installed via
helm template cert-manager jetstack/cert-manager --version 1.6.1 -f values.yaml
+kubectl apply
of the resulting yaml (values.yaml is obtained byhelm show values jetstack/cert-manager --version 1.6.1
and the only value changed in it iswebhook.securePort: 10260
) and of the corresponding CRDs
I can’t seem to debug further, as the solver pod+service+ingress live for just a minute or so.
/kind bug
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17
Not directly logs as such. You can get some information about the state of authorizations, orders etc by looking at the URLs on the cert-manager resources (i.e the ACME authorization URL that gets put on the status of Order). You can also increase log level on cert-manager controller with
--v=5
flag to controller which will, between other, make it log what calls in makes to ACMEThank you for the great issue description, I am still to read through the logs you posted
I have previously done this when debugging by modifying RBAC, so that cert-manager doesn’t have permissions to delete pods, services, ingresses and challenges- see https://github.com/jetstack/cert-manager/issues/4676#issuecomment-1003355941 although the user said it didn’t work for them, but it should be achievable by modifying RBAC.
The solver pod appears to be functioning as expected, the error at the end of its log output is it getting killed after cert-manager deleted the invalid
Challenge
.As you say, there will be no certs on crt.sh as it does not appear that ACME was able to successfully validate the challenge
Looking at the last part of logs from controller:
This is cert-manager waiting for ACME to accept the authorization, so at this point the self check must have succeeded, the challenge has been accepted with ACME and cert-manager waits for ACME to validate the challenge, but that times out presumably because the ACME request for the token that the solver pod serves fails.
This is cert-manager again attempting to accept the challenge and wait for ACME to validate it, but it gets back
authorization must be pending
. This could actually mean that the authorization was set to some other state than ‘pending’ (i.e ‘invalid’) in ACME as a result of the previous attempt to validate the challenge failing (see the conversation on https://github.com/jetstack/cert-manager/issues/4676 for context) so I think the actual issue is the timeout from ACME that happened before.As you suggest on the other issue, I think it is likely that the error is DNS/networking/ingress setup related so that the ACME server query for the challenge URL is failing