cert-manager: Failure to deploy to AKS with restricted egress/ingress

Hello folks!

I’ve followed the deploy instructions using HELM 3 (deploying CRDs manually before helm install) on an AKS cluster that has both ingress and egress restricted passing thru Azure Firewall. After the helm install pods on cert-manager namespace stays as this:

NAME                                       READY   STATUS             RESTARTS   AGE
cert-manager-69779b98cd-4ppzj              1/1     Running            0          62m
cert-manager-cainjector-7c4c4bbbb9-nb67l   0/1     CrashLoopBackOff   16         62m
cert-manager-webhook-6496b996cb-lhhlk      0/1     Running            0          62m

Logs from certmanager:

cert-manager
E0803 02:12:13.566122 1 leaderelection.go:320] error retrieving resource lock kube-system/cert-manager-controller: an error on the server ("") has prevented the request from succeeding (get configmaps cert-manager-controller)
cert-manager
E0803 02:12:45.214256 1 leaderelection.go:320] error retrieving resource lock kube-system/cert-manager-controller: an error on the server ("") has prevented the request from succeeding (get configmaps cert-manager-controller)
cert-manager
E0803 02:13:14.627751 1 leaderelection.go:320] error retrieving resource lock kube-system/cert-manager-controller: an error on the server ("") has prevented the request from succeeding (get configmaps cert-manager-controller)
cert-manager
E0803 02:13:52.359303 1 leaderelection.go:320] error retrieving resource lock kube-system/cert-manager-controller: an error on the server ("") has prevented the request from succeeding (get configmaps cert-manager-controller)

Logs from cainjector:

cert-manager
I0803 01:54:40.445401 1 start.go:82] starting ca-injector v0.16.0 (revision b14a7f2dfbc16b5a1228173f1a4de8985e66dbf3)
cert-manager
E0803 01:54:49.479170 1 manager.go:241] cert-manager/controller-runtime/manager "msg"="Failed to get API Group-Resources" "error"="an error on the server (\"\") has prevented the request from succeeding"
cert-manager
F0803 01:54:49.479192 1 start.go:118] error creating manager: an error on the server ("") has prevented the request from succeeding

Logs from webhook (keep looping on those 3 blocks):

cert-manager
I0803 01:32:49.157492 1 server.go:342] cert-manager/webhook "msg"="Health check failed as CertificateSource is unhealthy"
cert-manager
I0803 01:32:49.485345 1 dynamic_source.go:170] cert-manager/webhook "msg"="Generating new ECDSA private key"
cert-manager
I0803 01:32:49.490215 1 dynamic_source.go:185] cert-manager/webhook "msg"="Signing new serving certificate"
cert-manager
E0803 01:32:49.490261 1 dynamic_source.go:86] cert-manager/webhook "msg"="Failed to generate initial serving certificate, retrying..." "error"="failed verifying CA keypair: tls: failed to find any PEM data in certificate input" "interval"=1000000000
cert-manager
I0803 01:32:50.485453 1 dynamic_source.go:170] cert-manager/webhook "msg"="Generating new ECDSA private key"
cert-manager
I0803 01:32:50.490286 1 dynamic_source.go:185] cert-manager/webhook "msg"="Signing new serving certificate"
cert-manager
E0803 01:32:50.490325 1 dynamic_source.go:86] cert-manager/webhook "msg"="Failed to generate initial serving certificate, retrying..." "error"="failed verifying CA keypair: tls: failed to find any PEM data in certificate input" "interval"=1000000000
cert-manager
I0803 01:32:51.485453 1 dynamic_source.go:170] cert-manager/webhook "msg"="Generating new ECDSA private key"
cert-manager
I0803 01:32:51.490133 1 dynamic_source.go:185] cert-manager/webhook "msg"="Signing new serving certificate"
cert-manager
E0803 01:32:51.490169 1 dynamic_source.go:86] cert-manager/webhook "msg"="Failed to generate initial serving certificate, retrying..." "error"="failed verifying CA keypair: tls: failed to find any PEM data in certificate input" "interval"=1000000000

I’ve found in the docs some mentions about GKE firewall configs but I can’t find exactly which rules the firewall should have so I could add it to Azure Firewall as well.

I’ve sunobuoy and it points out that our cluster is perfectly fine deployment-wise as expected, since it is deployed and managed by Microsoft in Azure…

So, can you guys point out what is going on with my deployment?

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 15 (2 by maintainers)

Most upvoted comments

Just as a heads up. This problem happens with other services and not just cert-manager when you are deploying in a locked down cluster. If your application for whatever reason requires access to the API Server on AKS in those deployments, make sure you have the following variable set:

 - name: KUBERNETES_SERVICE_HOST
    value: <you-cluster-address>

For example <you-cluster-name>.hcp.brazilsouth.azmk8s.io.

galvesribeiro on Jul 26, 2021

For those running into the same problem — the underlying cause seems to be lack of connectivity to API server from the pods. When running in AKS with restricted egress you have to permit traffic from the pods to 443 on API server. (See https://docs.microsoft.com/en-us/azure/aks/limit-egress-traffic#azure-global-required-network-rules)

We initially ran into the same issue, verified that deploying into kube-system surprisingly works. But after adding appropriate firewalls rules we are able to deploy into cert-manager namespace with the default Helm chart and CRDs. It is not exactly clear why being within kube-system namespace changes things though.

illinar on Apr 2, 2021