cert-manager: proxy_protocol mode breaks HTTP01 challenge Check stage

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

When running ingress-nginx with use-proxy-protocol: true, the check stage of cert-manager fails as it (appears to) communicate with the ingress controller using plain HTTP requests.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

  • Deploy ingress-nginx
  • Configure an upstream load balancer that supports proxy protocol and enable it.
  • Set ConfigMap option use-proxy-protocol: true, and proxy-real-ip-cidr: x.x.x.x (Use the real load balancer IP) for the nginx controller
  • Deploy cert-manager
  • Request a certificate using HTTP01 confirmation.

Anything else we need to know?:

Environment:

  • Kubernetes version: Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:55:54Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration**: 1 master, 3 nodes, vSphere
  • Install tools:
  • Log files:

nginx-ingress-controller:

2018/04/13 12:27:55 [error] 1837#1837: *10321 broken header: "GET /.well-known/acme-challenge/9oQ5DbRUHNpnIsqvlvFUcb-km2OgpckyaXXEQh9cQQk HTTP/1.1
Host: therealhost.example.com
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip

" while reading PROXY protocol, client: 10.129.2.0, server: 0.0.0.0:80

cert-manager: E0413 12:25:57.580259 1 controller.go:196] certificates controller: Re-queuing item "kube-system/therealhost.example.com" due to error processing: error waiting for key to be available for domain "therealhost.example.com": context deadline exceeded

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 77 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Hi all, I ran into the same issue. I’ve just published hairpin-proxy which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxy

It uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager’s self-check to pass.

It’s able to do this all through DNS rewriting and spinning up a tiny HAProxy, so there’s no need to wait for either kubernetes or cert-manager to fix this issue in their packages.

But the code to achieve proxy protocol is super tiny. Can’t cert-manager just accept a configuration option so this would be opt-in?

If the traffic from the CM or CM challenge Pods are not going external, probably you have some special cluster DNS, network DNS, or perhaps hairpin routing getting in the way.

you are correct, i was able to confirm this. the public loadbalancer ip is resolved correctly for the http request.

but traffic is routed internally directly from the cert-manager pod to nginx-ingress-controller (does not go through the loadbalancer).

cluster is hosted on digitalocean, will edit this post if i find a solution.

Hi all. Same issue, in DigitalOcean k8s and proxy-protocol. Resolved, when i set annotation service.beta.kubernetes.io/do-loadbalancer-hostname in ingress controller. https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/annotations.md#servicebetakubernetesiodo-loadbalancer-hostname After this, certificates issuing work perfectly with HTTP01 challenge!

Just hit this with a DO managed Kube cluster and a DO LB in proxy-protocol mode. Seems like the the nginx-ingress LB is broken with DO’s proxy protocol:

https://github.com/kubernetes/ingress-nginx/issues/3996

From the Ingress

 " while reading PROXY protocol, client: 10.244.0.1, server: 0.0.0.0:80
2019/04/13 15:28:02 [error] 1384#1384: *248667 broken header: "GET /.well-known/acme-challenge/oF_X6SITHseBK1hdpEiZKbVCkmjvIiTHlPO46XXsSJM HTTP/1.1
Host: example.com
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip
Connection: close

Unfortunately the DNS01 challenge is broken for DigitalOcean in 0.7.0 (and based on my testing in 0.6.0 as well) so HTTP01 is a must for DO.

Bumping this again as I have some time over the next few weeks to create a PR for this. Would like a confirmation that it would not be useless work before starting though.

@munnerz Would cert-manager accept the changes adding proxy-protocol mode for self-check behind a configuration parameter so the behaviour would be opt in?

I get the same on a cluster at Brightbox, which also uses hairpinning by default and an external load balancer that sends PROXY protocol. The hairpinning causes other issues too admittedly, but it’s obviously a common setting for Kubernetes clusters.

If the traffic from the CM or CM challenge Pods are not going external, probably you have some special cluster DNS, network DNS, or perhaps hairpin routing getting in the way.

Since it’s fairly common (or at least, so I believe), perhaps an option just to disable testing the challenge? The challenge will work externally, from Let’s Encrypt, just not internally.

what if i have multiple domains pointing to the same LB address?

I faced this one as well, why is this one closed? Cert manager http01 challenge is not working with DO load balancer with proxy protocol at the moment, which is the only load balancer that makes sense (as it forwards request IP). I guess I will try to use DNS01 until resolved, but how HTTP01 should be working, can we open this one?

If the traffic from the CM or CM challenge Pods are not going external, probably you have some special cluster DNS, network DNS, or perhaps hairpin routing getting in the way.

you are correct, i was able to confirm this. the public loadbalancer ip is resolved correctly for the http request.

but traffic is routed internally directly from the cert-manager pod to nginx-ingress-controller (does not go through the loadbalancer).

cluster is hosted on digitalocean, will edit this post if i find a solution.

got the same issue with proxy protocol at digitalocean. my only workaround is to temporarily disable proxy protocol on the load balancer (and nginx ingress config map) allowing the certificate to be issued.

i’m hoping for a better solution to avoid interruption in service during certificate renewal.

More troubleshooting indicates there is some strange behaviour in my cluster, using a similar setup in minikube works fine.

The error message I’m seeing when I set logging to verbose is: I0414 05:41:23.649175 1 http.go:410] ACME HTTP01 self check failed for domain "therealhost.example.com", waiting 5s: Get http://therealhost.example.com/.well-known/acme-challenge/9oQ5DbRUHNpnIsqvlvFUcb-km2OgpckyaXXEQh9cQQk: EOF

From within the cert-manager instance, the DNS host resolves correctly:

kubectl exec -it --namespace=shared-services cert-manager-cert-manager-5c7bfd7dc4-kpffn sh
/ # nslookup therealhost.example.com
nslookup: can't resolve '(null)': Name does not resolve

Name:      therealhost.example.com
Address 1: 123.123.123.123

But if I install curl within the cert-manager instance I get odd behaviour:

/ # curl 'http://therealhost.example.com/.well-known/acme-challenge/9oQ5DbRUHNpnIsqvlvFUcb-km2OgpckyaXXEQh9cQQk'  -v
*   Trying 103.75.202.143...
* TCP_NODELAY set
* Connected to therealhost.example.com (123.123.123.123) port 80 (#0)
> GET /.well-known/acme-challenge/9oQ5DbRUHNpnIsqvlvFUcb-km2OgpckyaXXEQh9cQQk HTTP/1.1
> Host: therealhost.example.com
> User-Agent: curl/7.59.0
> Accept: */*
>
* Empty reply from server
* Connection #0 to host therealhost.example.com left intact
curl: (52) Empty reply from server

Whereas if I run the same command from any other host I get this:

curl 'http://therealhost.example.com/.well-known/acme-challenge/9oQ5DbRUHNpnIsqvlvFUcb-km2OgpckyaXXEQh9cQQk'  -v
* About to connect() to therealhost.example.com port 80 (#0)
*   Trying 103.75.202.143...
* Connected to therealhost.example.com (123.123.123.123) port 80 (#0)
> GET /.well-known/acme-challenge/9oQ5DbRUHNpnIsqvlvFUcb-km2OgpckyaXXEQh9cQQk HTTP/1.1
> User-Agent: curl/7.29.0
> Host: therealhost.example.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.13.9
< Date: Sat, 14 Apr 2018 05:49:08 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 87
< Connection: keep-alive
<
* Connection #0 to host therealhost.example.com left intact
9oQ5DbRUHNpnIsqvlvFUcb-km2OgpckyaXXEQh9cQQk.zrZMCbhC-Lh8qDFGdhEMA4BvJkeBPAzkThQKl4U_sOE

Hi all, I ran into the same issue. I’ve just published hairpin-proxy which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxy

It uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager’s self-check to pass.

It’s able to do this all through DNS rewriting and spinning up a tiny HAProxy, so there’s no need to wait for either kubernetes or cert-manager to fix this issue in their packages.

This works and fix the issue on scaleway too

Thank you very much @compumike! It works great 😃

Good job @Jyrno42 😃 But I would really like to have this fix/option in the original cert-manager release.

Like many others here, we have to temporary deactivate proxy-protocol while we wait for the certificates to renew. Doing this in a production environment is not ideal.

We cant use dns01 since we do not own many of the domains used.

Sorry about that, now should be fixed! @Isakgicu @tareksamni

Was transferring it to my personal gitlab account instead of the company one and forgot to make it public.

The purpose of the self check is to test an external request, to ensure when the ACME server makes its external request it will work. If your self-check requests are not going external, then I humbly submit that that is the problem that needs fixing.

@Jyrno42 Just tested this with a site behind nginx-ingress in proxy mode. Used images: jyrno42/cert-manager-controller-amd64:canary jyrno42/cert-manager-webhook-amd64:canary jyrno42/cert-manager-cainjector-amd64:canary

Self-check succeeded and the cert was assigned 🚀 Definitely not useless as the only way I can interact with an OpenStack Octavia load balancer properly is via proxy mode.

Hi there - we’re going through the issue backlog and this has been around for a while. Based on my understanding of this issue, there’s not any action in cert-manager to be taken.

It seems that the loadbalancer in front of your Ingress controller should be responsible for adding the PROXY header.

When cert-manager performs the ‘self check’, it does so by attempting to access your domain name just like any other user - i.e. it does not route traffic directly to your ingress controller and bypass the load balancer.

Unless there’s some funky network stuff going on here, or a DNS server rewriting responses, this should just work as your load balancer will add the appropriate PROXY headers when talking to your ingress controller.

I’m going to close this now as I don’t think there’s anything actionable. If my understanding above is incorrect, please let me know 😄 but I don’t see any reason why cert-manager should have to add these headers, as Let’s Encrypt themselves will never in any case add these headers.

@serafim no you cannot, but you can create extra DNS records on your own all pointing at the same load balancer IP address.

I recently updated the corresponding CCM documentation section. Please have a look and let me know if that clarifies things.

Hi, Timo here from DigitalOcean. Unfortunately, kubernetes/kubernetes#77523 does not seem to fix kubernetes/kubernetes#66607. (See also my coworker’s comment at https://github.com/kubernetes/kubernetes/pull/77523#issuecomment-490329492 and Andrew Sy Kim’s follow up at https://github.com/kubernetes/kubernetes/pull/77523#issuecomment-490592181 on that PR.) AFAICT, another upstream change would be required to fix the issue.

We do offer a workaround in the latest releases of DOKS clusters. See my comment here for further context and documentation.

@HeWhoWas Please can you reopen this issue as it hasn’t been resolved.

The challenge is being requested over HTTP, but the nginx-ingress controller is expecting requests to be made using the proxy protocol - which my load balancer is configured to do.

When the go-http client makes a request directly to the nginx-ingress controller (i.e. not using the load balancers external IP) the proxy_protocol isn’t used, causing it to fail,

If I manually run curl -v 'http://therealhost.example.com/.well-known/acme-challenge/9oQ5DbRUHNpnIsqvlvFUcb-km2OgpckyaXXEQh9cQQk the response works fine, as the initial HTTP request is terminated by the load balancer, and the backend request to nginx-ingress is made using the proxy_protocol.

I am currently using the quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0 controller, but this is not related to HTTPS, or redirection in any manner.

Just for people coming here from Google (like me): @KeksBeskvitovich’s solution also works for Hetzner Cloud by setting these annotations :

load-balancer.hetzner.cloud/uses-proxyprotocol: "true"
load-balancer.hetzner.cloud/hostname: lb-subdomain.example.com

Thanks, also facing this issue on hetzner. Fix seems to work but I have multiple domains pointing to that LB. It is also not a scalabe solution since the ingress needs to know the domain (all domains) before routes are applied and it is reasanoble to assume that these resources are not handled by the same person. Ingress routes may even be applied by CI.

Is there any other way to stop the kube-proxy to alter the DNS resolution so the external IP is used?

https://github.com/compumike/hairpin-proxy this worked for me.

Hi all, I ran into the same issue. I’ve just published hairpin-proxy which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxy It uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager’s self-check to pass. It’s able to do this all through DNS rewriting and spinning up a tiny HAProxy, so there’s no need to wait for either kubernetes or cert-manager to fix this issue in their packages.

This works and fix the issue on scaleway too

it works for me also, thanks

@munnerz: I’m going to close this now as I don’t think there’s anything actionable. If my understanding above is incorrect, please let me know smile but I don’t see any reason why cert-manager should have to add these headers, as Let’s Encrypt themselves will never in any case add these headers.

Apologies all if I have missed something, it’s quite a long thread.

I have a setup where I want certbot standalone to receive a request via a DigitalOcean Loadbalancer that has PROXY PROTOCOL enabled. This means that the the HTTP request received by certbot is prefixed by a line like:

PROXY TCP4 192.168.0.1 192.168.0.2 42300 443\r\n

This is nothing to do with HTTPS or Kubernetes. It is simply that certbot, I assume, fails to understand the request because of the extra initial PROXY PROTOCOL data before it in the TCP data.

My question is: can certbot standalone be told to expect the connection to come with a PROXY PROTOCOL “header” and to parse/handle/skip/ignore it so that the HTTP-01 challenge will work as it does without PROXY PROTOCOL?

Based on the title of this issue this would seem to be the right place to ask, but the issue seems to have been mostly about the Kubernetes ingress controller even though there’s’ nothing in the issue title about Kubernetes.

Reference: https://blog.digitalocean.com/load-balancers-now-support-proxy-protocol/

I contacted DigitalOcean support and they’ve told me that it is expected to have Kubernetes v15.1 available in DO by early August. I’ve referenced this issue and also issue in Kubernetes so they are aware of importance of this release.

Hi there!

Thank you for contacting DigitalOcean and providing such a detailed ticket! 

Our plans are to have 1.15.1 available by early August.

Let me know if you have any additional questions.

Regards,

John K. (redacted for security)
Senior Developer Support Engineer

I believe this is due to a design flaw in Kubernetes. When using LoadBalancer services that use an external service (like Brightbox or DO mentioned here), kube-proxy intercepts the outgoing requests to the load balancer external IP at the network level to keep them within the Kubernetes cluster but doesn’t understand that some LoadBalancers can do more than just standard TCP balancing. So this will break internal connections to external load balancers that do more, such as proxy-support or even SSL offloading.

We’ve now fixed this at Brightbox by not telling kube-proxy about the external IP addresses of the LoadBalancers, so it doesn’t intercept them. I think DO are going to fix it the same way, and AWS have done this all along. See https://github.com/kubernetes/kubernetes/issues/66607 for more details.

So this isn’t a cert-manager problem.

Just for people coming here from Google (like me):

@KeksBeskvitovich’s solution also works for Hetzner Cloud by setting these annotations :

load-balancer.hetzner.cloud/uses-proxyprotocol: "true"
load-balancer.hetzner.cloud/hostname: lb-subdomain.example.com

and this value:

controller:
  config:
    use-proxy-protocol: true

for ingress-nginx. Afterwards, you will have both the source IP and a working cert-manager!

I am yet another “me too” for this issue on Digital Ocean… even with the latest 1.22 release:

but maybe this will be solved in 1.25

❯ kubectl version
<...>
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.8", GitCommit:"7061dbbf75f9f82e8ab21f9be7e8ffcaae8e0d44", GitTreeState:"clean", BuildDate:"2022-03-16T14:04:34Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

My symptoms are the same as many of you:

I0427 15:56:48.000286 1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="my-domain.tld" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-sz4bv" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="serviceNameHere-jxk97-648906699-2238484802" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0427 15:56:48.011217 1 sync.go:186] cert-manager/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://my-domain.tld/.well-known/acme-challenge/BxoPaSKOqSedr3HvG47jQGKNOxFJBTNDTUfoeNnzjEg': Get \"http://my-domain.tld/.well-known/acme-challenge/BxoPaSKOqSedr3HvG47jQGKNOxFJBTNDTUfoeNnzjEg\": EOF" "dnsName"="my-domain.tld" "resource_kind"="Challenge" "resource_name"="serviceNameHere-jxk97-648906699-2238484802" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"

But when fetching the URL externally, things work:

❯ curl -vvv http:///my-domain.tld/.well-known/acme-challenge/BxoPaSKOqSedr3HvG47jQGKNOxFJBTNDTUfoeNnzjEg
<...>
> GET /.well-known/acme-challenge/BxoPaSKOqSedr3HvG47jQGKNOxFJBTNDTUfoeNnzjEg HTTP/1.1
> Host: /my-domain.tld
> User-Agent: curl/7.82.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Wed, 27 Apr 2022 15:45:27 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 87
< Connection: keep-alive
< Cache-Control: no-cache, no-store, must-revalidate
< 
* Connection #0 to host /my-domain.tld left intact
BxoPaSKOqSedr3HvG47jQGKNOxFJBTNDTUfoeNnzjEg.sw-Xjc-9d_IujG1ZEJT7Ol--ZOFyfKlHemo2LYZhDn0% 

I was able to move past this by:

  • Adding the service.beta.kubernetes.io/do-loadbalancer-hostname annotation to the Service created by the nginx-ingress manifest
  • re-apply the manifest
  • delete the cert kubectl delete certificates ServiceNameHere-cert
  • A few seconds later, cmctl status certificate guestbook-cert showed the cert as issued.

Note: Questions were resolved, please see edits below, thank you!

@Jyrno42 Thanks for providing the patched cert-manager images! The fix for the proxy protocol works well with my ingress-nginx in proxy protocol mode 👍

However, I found an issue you (or someone else 😄) maybe can help me with. The pod hosting the HTTP01 challenge does not start since the docker image tag is not properly populated:

$ kubectl describe pod cm-acme-http-solver-hls7k
Name:         cm-acme-http-solver-hls7k
Namespace:    staging
Priority:     0
Node:         xxxxx
Start Time:   Tue, 11 May 2021 11:20:53 +0200
Labels:       acme.cert-manager.io/http-domain=xxxx
              acme.cert-manager.io/http-token=xxxx
              acme.cert-manager.io/http01-solver=true
Annotations:  cni.projectcalico.org/podIP: xxxx
              cni.projectcalico.org/podIPs: xxxx
              sidecar.istio.io/inject: false
Status:       Pending
IP:           xxxx
IPs:
  IP:           xxxx
Controlled By:  Challenge/backend-dashboard-tls-dvkd4-3440324594-2368510026
Containers:
  acmesolver:
    Container ID:
    Image:         quay.io/jetstack/cert-manager-acmesolver:{STABLE_DOCKER_TAG}
    Image ID:
...

Manually editing the tag to v1.3.1 fixes the issue but is a very inconvenient resolution 😃

Do you know how to fix this? Maybe the auto-build needs some updates?

I’m using v1.3.0 of your patched images (e.g. jyrno42/cert-manager-controller:v1.3.0). Another question: Is there also a way to provide a patched version of cert-manager 1.3.1?

Thank you so much for your help and the effort you already put into fixing the problem at hand! 🎉

EDIT: So the problem with the {STABLE_DOCKER_TAG} somehow resolved by re-deploying cert-manager 🤷 EDIT 2: Jyrno42 fixed the build, so 1.3.1 is now also available as patched version. Thanks again!

Just wanted to let everyone know that this is still something that does pop up with DigtalOcean loadbalancers with the proxy protocol. It looks like there is some upstream activity to hopefully fix the root cause for this: https://github.com/kubernetes/kubernetes/issues/66607

It looks like there is a KEP and corresponding PR to fix the underlying issue causing this in Kubernetes: https://github.com/kubernetes/enhancements/pull/1392

I’m still not convinced we should send PROXY headers during the self check or look to support this natively in cert-manager. Ultimately it is caused by traffic being routed incorrectly within your network/cluster, and once this issue is resolved there will be no good reason for something like this. If we were to add this, it’d need to be behind a feature gate, marked ‘alpha’, clearly link to the issues describing the problem and additionally noting that the feature will be removed in a future release 🙂 (and a note/FAQ in the docs about it and when you’d want to use it).

On Thu, 20 Feb 2020 at 12:50, Daniel Bjørnådal notifications@github.com wrote:

Good job @Jyrno42 https://github.com/Jyrno42 😃 But I would really like to have this fix/option in the original cert-manager release.

Like many others here, we have to temporary deactivate proxy-protocol while we wait for the certificates to renew. Doing this in a production environment is not ideal.

We cant use dns01 since we do not own many of the domains used.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jetstack/cert-manager/issues/466?email_source=notifications&email_token=AABRWP6AJVBZLW2PBCLO6ZDRDZ4AXA5CNFSM4E2NPCQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMNYEPY#issuecomment-589005375, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRWP4RIDNEEKSF4H67MQLRDZ4AXANCNFSM4E2NPCQA .

@Jyrno42 The posted gitlab link is a 404! Could you please check if the project is not private?

I configured Gitlab to make automated releases with my patch every time a new version of cert-manager gets released.

Repository itself is here: https://gitlab.com/jyrno42/cert-manager-patcher

This means you can update your deployments to use v0.12.0 tag not canary as you might have used previously.

Not my images. Just fished them out from @Jyrno42 's repo. But hey yw for the convenience haha

@timoreimann thank you! This solved my issue with cert-manager when multiple domains pointed to one DNS record (load balancer’s IP).

I created a fresh cluster on digitalocean using the latest kubernetes 1.15.2 and I am still encountering this issue. Can anyone confirm that it is working for them on DO managed kubernetes v1.15.2?

Same issue here. I need proxy-protocol because of client IPs so it is not a solution to disable it. If it is on, cert-manager is not working because of that pre-check.

There are two solutions:

  1. updating cert-manager code to retry test with proxy-protocol header if simple get failed or
  2. allow cert-manager config to disable pre-check

I would prefer solution No.1 because there is a reason we are checking that endpoint before asking LetsEncrypt to do the same. This check is valuable as it prevents quota issues.

This is important to be resolved as people will need proxy protocol and it is bad if cert-manager is unable to work in that case. I think any of two solutions would let us go around this problem.

I see that pre-check code is at: https://github.com/jetstack/cert-manager/blob/70bc3e845bffac5acc10934911648a42a3a05ed1/pkg/issuer/acme/http/http.go#L184

With curl, I was able to connect with this line. It returned body.

http_proxy=http://my.website.com:80 curl -v http://my.website.com/.well-known/acme-challenge/mwbIQcwaB9LL6wwIGRjQuRfL8cl5lFfGXocuQ3Y_fqs --haproxy-protocol

I see a recent commit https://github.com/jetstack/cert-manager/commit/099abed3fc6717010e2ed5bac795905c0bc0ebd0 and PR https://github.com/jetstack/cert-manager/pull/1850 by @kinolaev that can have connection to what we are talking about.

Is this change adding PROXY header to checks or just using proxy?

I think @johnl’s reference to https://github.com/kubernetes/kubernetes/issues/66607 is the cause. By luck or design, it works on AWS because the AWS k8s cloud provider code only adds the external host name not the external IP address.

The good news is a patch to Kubernetes in https://github.com/kubernetes/kubernetes/pull/77523 should eventually fix this for everyone.

I don’t think it is a cert-manager problem, but it worth keeping this open to track the fix and warn other cert-manager users of this issue.

@HeWhoWas i ran into the same issue today, did you find a solution?

from within cert-manager pod, DNS lookups resolve the public loadbalancer IP.

but it still seems that cert-manager will contact the node-ip directly to solve the acme challenge, hence does not go through the public loadbalancer

@munnerz - There is no split horizon DNS, the only issue is that the nginx-ingress controller is expecting requests using the proxy_protocol method, and the cert-manager controller is making requests to it using plain HTTP.

I would suggest adding a flag that toggles cert-manager to make all requests to the nginx-ingress controller via the load balancer instead of hitting it directly.