cert-manager: FailedDiscoveryCheck (403) with cert-manager Webhook
Describe the bug: I’m trying to deploy a on-prem k8s cluster and I want to user cert-manager for the certificates. When I try to create a ClusterIssuer, it says that
Internal error occurred: failed calling webhook "webhook.certmanager.k8s.io": the server is currently unable to handle the request
When I run kubectl get apiservice
it returns me the following error:
failing or missing response from https://<internal-svc-ip>:443/apis/webhook.certmanager.k8s.io/v1beta1: bad status from https://<internal-svc-ip>:443/apis/webhook.certmanager.k8s.io/v1beta1: 403
Expected behaviour:
Issuer is created when I run kubectl apply
Steps to reproduce the bug:
- Create namespace
cert-manager
- Deploy using the manifest YAML
- Try to create an Issuer following the example and documentation. Also trying the tests
Anything else we need to know?:
Environment details::
-
Kubernetes version (e.g. v1.10.2): 1.15.3
-
Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): on-prem
-
cert-manager version (e.g. v0.4.0): 0.10
-
Install method (e.g. helm or static manifests): static manifest at
https://github.com/jetstack/cert-manager/releases/download/v0.10.0/cert-manager.yaml
-
YAML file:
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
email: <my-mail>
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
# Secret resource used to store the account's private key.
name: example-clusterissuer-key
# Add a single challenge solver, HTTP01 using nginx
solvers:
- http01:
ingress:
class: nginx
I also have installed nginxinc/kubernetes-ingress
/kind bug
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 27 (5 by maintainers)
@otakumike sure thing, here it is. Given the logs and error messages I knew the port had to be
6443
and the source addresses those of the k8s master, hence:I’m seeing this as well in EKS when trying to use a custom CNI. For metrics server, I put the API Service on the host network and that resolved the issue:
https://github.com/helm/charts/blob/c4d3dde988271fddf80c00bd9281453202234b9d/stable/metrics-server/templates/metrics-server-deployment.yaml#L38-L40
Can we get something like this for the cert-manager chart? Manually adding this to the deployment after the install makes the API Service go available:
Just stumbled upon this. It seems to be related to #2340. I also have a private cluster with GKE and adding an ingress firewall rule granting access from the master API CIDR range to port
6443
resolved the issue for me.This is also documented here
Same problem with kubeadm on AWS. Kubernetes: 1.16.0
Just to clarify, you should not need to create any additional RBAC resources in order to make the webhook work.
Issues like this stem from communication problems between the Kubernetes apiserver and the webhook component, and you can follow the ‘chain’ of communication like so:
If any part of that communication flow doesn’t work, you’ll see errors as you’ve described.
Typically, and as some people have noted above, this sometimes falls down at the
* A Kubernetes APIService resource exposes the webhook as a part of the Kubernetes API
level - the Kubernetes apiserver is unable to communicate with the webhook.This can be caused by many things, but for example, on GKE this is caused by firewall rules blocking communication to the Kubernetes ‘worker’ nodes from the control plane. This is remediated by adding additional firewall rules to grant this permission.
On AWS, it really depends on how you’ve configured your VPCs/security groups and how you’ve configured networking. Notably though, you must configure your control plane so that it can communicate with pod/service IPs from the ‘apiserver’ container/network namespace.
You’ll also run into this issue if you try and deploy metrics-server too, as this is deployed in a similar fashion.
@skuro are you using “private GKE nodes” by any chance?
In my case on a fresh GKE cluster (v1.13.7-gke.24) with kubectl (v1.11.1 or v1.14.3) it seems to just be a matter of waiting.
After I first apply the static manifest:
If I try to create any ClusterIssuer right away, I get:
This seems to correspond with:
But if I wait a few seconds, that eventually changes to:
And at that point if I try again to apply my ClusterIssuer manifest it works. This stops me from being able to
kubectl apply -Rf
my whole cert-manager + issuers manifests in one go.Isn’t there some way to let me declare everything at once and have the issuers work when they’re ready? Isn’t that the k8s way?
Update: Workaround
This workaround gets it done for me for now:
Looks like it is the same as https://github.com/istio/istio/issues/10637. I build my clusters with Terraform and I was able to solve the linked issue by adding the following security group rule:
I will test later whether this solves this issue here, too.