cert-manager: Self check always fail
Describe the bug: Unable to pass “self check” when Ingress Service is using NodePort and public IP is on HA proxy (tcp mode) outside the Kubernetes cluster. We can simulate the test from cert-manger container (kubectl exec) using curl (fetching /.well-known/…), which is successful. The same applies from outside the cluster.
Logs:
helpers.go:188 Found status change for Certificate “myip-secret” condition “Ready”: “False” -> “False”; setting lastTransitionTime to 2018-08-29 14:36:25.387757463 +0000 UTC m=+2049.620517469 sync.go:244 Error preparing issuer for certificate pwe/pwe-secret: http-01 self check failed for domain “www.example.com” controller.go:190 certificates controller: Re-queuing item “default/myip-secret” due to error processing: http-01 self check failed for domain “www.example.com”
We replaced real domain name in this bug report for www.example.com
The cert-manager is working only when public IP is on Kubernetes cluster and Ingress Service is using LoadBalancer method.
Expected behaviour: self check to pass with NodePort on Ingress Service
Steps to reproduce the bug:
cat <<EOF > /root/nginx-ingress.yaml
---
apiVersion: v1
kind: Service
metadata:
name: nginx-ingress
namespace: nginx-ingress
spec:
externalTrafficPolicy: Local
type: NodePort
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
nodePort: 31080
- port: 443
targetPort: 443
protocol: TCP
name: https
nodePort: 31443
selector:
app: nginx-ingress
EOF
cat <<EOF > /root/letsencrypt-staging.yml
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
# Adjust the name here accordingly
name: letsencrypt-staging
spec:
acme:
# The ACME server URL
server: https://acme-staging-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: name@example.com
# Name of a secret used to store the ACME account private key from step 3
privateKeySecretRef:
name: letsencrypt-staging-private-key
# Enable the HTTP-01 challenge provider
http01: {}
EOF
cat <<EOF > /root/myip-ingress.yml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: myip-ingress
annotations:
kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: "nginx"
certmanager.k8s.io/cluster-issuer: letsencrypt-staging
spec:
tls:
- hosts:
- www.example.com
secretName: myip-secret
rules:
- host: www.example.com
http:
paths:
- path: /
backend:
serviceName: myip-svc
servicePort: 80
EOF
# Nginx ingress
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/ns-and-sa.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/default-server-secret.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/nginx-config.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/rbac/rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/daemon-set/nginx-ingress.yaml
kubectl create -f /root/nginx-ingress.yaml
# CertManager
kubectl create -f https://raw.githubusercontent.com/jetstack/cert-manager/master/contrib/manifests/cert-manager/with-rbac.yaml
kubectl create -f /root/letsencrypt-staging.yml
# MyApp
kubectl run myip --image=cloudnativelabs/whats-my-ip --replicas=1 --port=8080
kubectl expose deployment myip-svc --port=8080 --target-port=8080
kubectl create -f /root/myip-ingress.yml
openssl req -x509 -nodes -days 3650 -newkey rsa:2048 -keyout /root/tls.key -out /root/tls.crt -subj "/CN=www.example.com"
kubectl create secret tls myip-secret --key /root/tls.key --cert /root/tls.crt
Anything else we need to know?: It is not clear to us, what exactly the self check is expecting to find, because the fetch of /well-known key is successful (confirmed via wireshark), but the self check is running again and again and still failing. Some more details about the reason of fail would be great.
Wireshark captured data - request from Cluster Node to HA proxy:
GET /.well-known/acme-challenge/B2tNUfzfPgK_VOF7AAQEktKaikWxwBQlD0uL77d0N8k HTTP/1.1
Host: pwe.kube.freebox.cz
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip
HTTP/1.1 200 OK
Server: nginx/1.15.2
Date: Wed, 29 Aug 2018 14:42:26 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 87
Connection: keep-alive
B2tNUfzfPgK_VOF7AAQEktKaikWxwBQlD0uL77d0N8k.6RElade5K0jHqS1ysziuv2Gm3_LgD-D9APNRg5k8sak
Environment details::
- Kubernetes version v1.11.2
- cert-manager version (v0.4.1)
- nginx-ingress (v1.15.2)
- Install method (primary via kubectl, but we also tried helm [following this guide - https://dzone.com/articles/secure-your-kubernetes-services-using-cert-manager] with the same result):
/kind bug
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 7
- Comments: 24 (1 by maintainers)
I was informed by DigitalOcean team that there is a fix for this behavior. They added an additional annotation to nxinx-ingress controller service that forces Kubernetes to use domain name of public IP instead of IP and that tricks Kubernetes to think that it is not “ours” and routes network around through LB.
https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/examples/README.md#accessing-pods-over-a-managed-load-balancer-from-inside-the-cluster This is it: (I just added this one)
I’m going to close this issue out as it seems to be more related to network configuration than anything else. Let’s Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges, and exposing your ingress controller to the public internet (either via a LoadBalancer service or a NodePort) is outside the scope of cert-manager itself. We just need port 80 to work 😄
Hi all, I ran into the same issue. I’ve recently published
hairpin-proxy
which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxyIt uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager’s self-check to pass.
@AlexsJones Not for me. I had to add the annotation below
After changing
externalTrafficPolicy: Local
toexternalTrafficPolicy: Cluster
, I was able to perform self check.Reason being, pod with the certificate-issuer wound up on a different node than the load balancer did, so it couldn’t talk to itself through the ingress.
Port 80 isn’t the issue, that’s a given. The IP address is though. All installations behind NAT is likely going to fail without hairpin config. If not allow self-check be disabled, maybe mention it in docs?
The problem is in Kubernetes networking if you use LoadBalancer that is provided by the hosting. I use DigitalOcean. Kubernetes is not routing network through LB public interface so there is no adding PROXY protocol header or SSL if you are setting it outside Kubernetes. I use PROXY protocol and the moment when I enable it and update Nginx to handle it everything works but cert-manager fails as it is trying to connect to public domain name and that fails. It works from my computer as I am outside and LB is adding needed headers, but not from within the cluster.
Cert-manager is not guilty for this, but if we can add some switches where we can instruct validator to add PROXY protocol or to disable validation for that domain it would help a lot.
For curl if I do (from inside the cluster):
it fails.
If I do (from inside the cluster):
it works.
As @vitobotta points out, but with lack of context, for cert-manager running in a Scaleway Kubernetes cluster
This annotations should be applied to the
LoadBalancer
service created by ingress-nginx.If you’re configuring ingress-nginx with Helm, you can set the value
controller.service.annotations.\"service\\.beta\\.kubernetes\\.io/scw-loadbalancer-use-hostname\"
to"true"
@MichaelOrtho Hi, do you know if a similar workaround exists for Scaleway? I am testing their managed Kubernetes and am having the same problem. Thanks
I guess this means “Cloudflare Always Use HTTPS” was causing this for me. Perhaps something about requiring port 80 and HTTP access to the domain here would be good: https://docs.cert-manager.io/en/latest/getting-started/troubleshooting.html