cert-manager: Self check always fail

Describe the bug: Unable to pass “self check” when Ingress Service is using NodePort and public IP is on HA proxy (tcp mode) outside the Kubernetes cluster. We can simulate the test from cert-manger container (kubectl exec) using curl (fetching /.well-known/…), which is successful. The same applies from outside the cluster.

Logs:

helpers.go:188 Found status change for Certificate “myip-secret” condition “Ready”: “False” -> “False”; setting lastTransitionTime to 2018-08-29 14:36:25.387757463 +0000 UTC m=+2049.620517469 sync.go:244 Error preparing issuer for certificate pwe/pwe-secret: http-01 self check failed for domain “www.example.com” controller.go:190 certificates controller: Re-queuing item “default/myip-secret” due to error processing: http-01 self check failed for domain “www.example.com”

We replaced real domain name in this bug report for www.example.com

The cert-manager is working only when public IP is on Kubernetes cluster and Ingress Service is using LoadBalancer method.

Expected behaviour: self check to pass with NodePort on Ingress Service

Steps to reproduce the bug:

cat <<EOF > /root/nginx-ingress.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress
  namespace: nginx-ingress
spec:
  externalTrafficPolicy: Local
  type: NodePort
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
    name: http
    nodePort: 31080
  - port: 443
    targetPort: 443
    protocol: TCP
    name: https
    nodePort: 31443
  selector:
    app: nginx-ingress
EOF


cat <<EOF > /root/letsencrypt-staging.yml
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  # Adjust the name here accordingly
  name: letsencrypt-staging
spec:
  acme:
    # The ACME server URL
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: name@example.com
    # Name of a secret used to store the ACME account private key from step 3
    privateKeySecretRef:
      name: letsencrypt-staging-private-key
    # Enable the HTTP-01 challenge provider
    http01: {}
EOF

cat <<EOF > /root/myip-ingress.yml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: myip-ingress
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "nginx"
    certmanager.k8s.io/cluster-issuer: letsencrypt-staging
spec:
  tls:
  - hosts:
    - www.example.com
    secretName: myip-secret
  rules:
  - host: www.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: myip-svc
          servicePort: 80
EOF

# Nginx ingress
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/ns-and-sa.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/default-server-secret.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/nginx-config.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/rbac/rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/daemon-set/nginx-ingress.yaml
kubectl create -f /root/nginx-ingress.yaml

# CertManager
kubectl create -f https://raw.githubusercontent.com/jetstack/cert-manager/master/contrib/manifests/cert-manager/with-rbac.yaml
kubectl create -f /root/letsencrypt-staging.yml

# MyApp
kubectl run myip --image=cloudnativelabs/whats-my-ip --replicas=1 --port=8080
kubectl expose deployment myip-svc --port=8080 --target-port=8080
kubectl create -f /root/myip-ingress.yml
openssl req -x509 -nodes -days 3650 -newkey rsa:2048 -keyout /root/tls.key -out /root/tls.crt -subj "/CN=www.example.com"
kubectl create secret tls myip-secret --key /root/tls.key --cert /root/tls.crt

Anything else we need to know?: It is not clear to us, what exactly the self check is expecting to find, because the fetch of /well-known key is successful (confirmed via wireshark), but the self check is running again and again and still failing. Some more details about the reason of fail would be great.

Wireshark captured data - request from Cluster Node to HA proxy:

GET /.well-known/acme-challenge/B2tNUfzfPgK_VOF7AAQEktKaikWxwBQlD0uL77d0N8k HTTP/1.1
Host: pwe.kube.freebox.cz
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip

HTTP/1.1 200 OK
Server: nginx/1.15.2
Date: Wed, 29 Aug 2018 14:42:26 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 87
Connection: keep-alive

B2tNUfzfPgK_VOF7AAQEktKaikWxwBQlD0uL77d0N8k.6RElade5K0jHqS1ysziuv2Gm3_LgD-D9APNRg5k8sak

Environment details::

Kubernetes version v1.11.2
cert-manager version (v0.4.1)
nginx-ingress (v1.15.2)
Install method (primary via kubectl, but we also tried helm [following this guide - https://dzone.com/articles/secure-your-kubernetes-services-using-cert-manager] with the same result):

/kind bug

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 7
Comments: 24 (1 by maintainers)

Most upvoted comments

The problem is in Kubernetes networking if you use LoadBalancer that is provided by the hosting. I use DigitalOcean. Kubernetes is not routing network through LB public interface so there is no adding PROXY protocol header or SSL if you are setting it outside Kubernetes. I use PROXY protocol and the moment when I enable it and update Nginx to handle it everything works but cert-manager fails as it is trying to connect to public domain name and that fails. It works from my computer as I am outside and LB is adding needed headers, but not from within the cluster.

Cert-manager is not guilty for this, but if we can add some switches where we can instruct validator to add PROXY protocol or to disable validation for that domain it would help a lot.

For curl if I do (from inside the cluster):
curl -I https://myhost.domain.com
it fails.

If I do (from inside the cluster):
curl -I https://myhost.domain.com --haproxy-protocol
it works.

I was informed by DigitalOcean team that there is a fix for this behavior. They added an additional annotation to nxinx-ingress controller service that forces Kubernetes to use domain name of public IP instead of IP and that tricks Kubernetes to think that it is not “ours” and routes network around through LB.

https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/examples/README.md#accessing-pods-over-a-managed-load-balancer-from-inside-the-cluster This is it: (I just added this one)

kind: Service
apiVersion: v1
metadata: 
  name: nginx-ingress-controller
  annotations: 
    service.beta.kubernetes.io/do-loadbalancer-hostname: "hello.example.com"

+33

MichaelOrtho on Dec 18, 2019

I’m going to close this issue out as it seems to be more related to network configuration than anything else. Let’s Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges, and exposing your ingress controller to the public internet (either via a LoadBalancer service or a NodePort) is outside the scope of cert-manager itself. We just need port 80 to work 😄

+17

munnerz on Jan 10, 2019

Hi all, I ran into the same issue. I’ve recently published hairpin-proxy which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxy

It uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager’s self-check to pass.

+13

compumike on Nov 8, 2020

@AlexsJones Not for me. I had to add the annotation below

"service.beta.kubernetes.io/scw-loadbalancer-use-hostname": "true"

vitobotta on Apr 16, 2020

...
apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress
  namespace: nginx-ingress
spec:
  externalTrafficPolicy: Local
  type: NodePort
...

After changing externalTrafficPolicy: Local to externalTrafficPolicy: Cluster, I was able to perform self check.

Reason being, pod with the certificate-issuer wound up on a different node than the load balancer did, so it couldn’t talk to itself through the ingress.

btwiuse on May 29, 2020

Port 80 isn’t the issue, that’s a given. The IP address is though. All installations behind NAT is likely going to fail without hairpin config. If not allow self-check be disabled, maybe mention it in docs?

ptjhuang on Jan 22, 2019

The problem is in Kubernetes networking if you use LoadBalancer that is provided by the hosting. I use DigitalOcean. Kubernetes is not routing network through LB public interface so there is no adding PROXY protocol header or SSL if you are setting it outside Kubernetes. I use PROXY protocol and the moment when I enable it and update Nginx to handle it everything works but cert-manager fails as it is trying to connect to public domain name and that fails. It works from my computer as I am outside and LB is adding needed headers, but not from within the cluster.

Cert-manager is not guilty for this, but if we can add some switches where we can instruct validator to add PROXY protocol or to disable validation for that domain it would help a lot.

For curl if I do (from inside the cluster):

curl -I https://myhost.domain.com

it fails.

If I do (from inside the cluster):

curl -I https://myhost.domain.com --haproxy-protocol

it works.

MichaelOrtho on Dec 17, 2019

As @vitobotta points out, but with lack of context, for cert-manager running in a Scaleway Kubernetes cluster

"service.beta.kubernetes.io/scw-loadbalancer-use-hostname": "true"

This annotations should be applied to the LoadBalancer service created by ingress-nginx.

service.beta.kubernetes.io/scw-loadbalancer-use-hostname This is the annotation that forces the use of the LB hostname instead of the public IP. This is useful when it is needed to not bypass the LoadBalacer for traffic coming from the cluster.

If you’re configuring ingress-nginx with Helm, you can set the value controller.service.annotations.\"service\\.beta\\.kubernetes\\.io/scw-loadbalancer-use-hostname\" to "true"

stigok on Jan 23, 2022

@MichaelOrtho Hi, do you know if a similar workaround exists for Scaleway? I am testing their managed Kubernetes and am having the same problem. Thanks

vitobotta on Mar 29, 2020

Let’s Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges

I guess this means “Cloudflare Always Use HTTPS” was causing this for me. Perhaps something about requiring port 80 and HTTP access to the domain here would be good: https://docs.cert-manager.io/en/latest/getting-started/troubleshooting.html

intellix on Apr 20, 2019