opentelemetry-operator: Cannot create Collector : Webhook deadline exceeded

Hi,

I cannot create the simplest OpenTelemetryCollector. I get the following error when I try to create it from STDIN:

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "mopentelemetrycollector.kb.io": Post https://opentelemetry-operator-webhook-service.opentelemetry-operator-system.svc:443/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector?timeout=30s: context deadline exceeded

The Opentelemetry Operator Controller Manager is up and running in the namespace opentelemetry-operator-system:

$ kubectl get po -n opentelemetry-operator-system                 
NAME                                                         READY   STATUS    RESTARTS   AGE
opentelemetry-operator-controller-manager-56f75fbb5d-qrdst   2/2     Running   0          17m

And the logs of both containers (manager and kube-rbac-proxy) do not show any error.

I installed the required resources using:

$ kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

Is there something I am missing ?

Thank you for your help !

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 41 (15 by maintainers)

Most upvoted comments

Managed to get some time debugging this today, I set myself up a Private GKE cluster and was able to reproduce the context deadline exceeded issue. I can confirm that this is due ports 9443 not being open to the master.

I created these firewall rules for the master and everything works as it should:

gcloud compute firewall-rules create cert-manager-9443 \
  --source-ranges ${GKE_MASTER_CIDR} \
  --target-tags ${GKE_MASTER_TAG}  \
  --allow TCP:9443

Maybe some documentation should be added regarding private clusters to ensure these ports are open.

Edit: 8443 not needed, just 9443.

Managed to get some time debugging this today, I set myself up a Private GKE cluster and was able to reproduce the context deadline exceeded issue. I can confirm that this is due ports 9443 not being open to the master.

I created these firewall rules for the master and everything works as it should:

gcloud compute firewall-rules create cert-manager-9443 \
  --source-ranges ${GKE_MASTER_CIDR} \
  --target-tags ${GKE_MASTER_TAG}  \
  --allow TCP:9443

Maybe some documentation should be added regarding private clusters to ensure these ports are open.

Edit: 8443 not needed, just 9443.

Thank you!

This also resolve the issue in GKE Autopilot!

I’m closing this, as the original report has been fixed.

@jpkrohling it’s a work cluster, though it is just staging, might be tricky, I’ll run it past the team at stand up tomorrow, I was gonna try and set up my own GKE cluster and see if I can replicate there rather than minikube.

I’ll do another fresh install on the cluster tomorrow morning and give you all the logs / events I can get my hands on.

Yeah I am at a loss as well 🤷

The cert seems fine based on the sate of the resources.

➜ kubectl get Issuer
NAME                                       READY   AGE
opentelemetry-operator-selfsigned-issuer   True    19h
➜ kubectl get Certificate
NAME                                  READY   SECRET                                                   AGE
opentelemetry-operator-serving-cert   True    opentelemetry-operator-controller-manager-service-cert   19h
➜ kubectl get CertificateRequest
NAME                                        APPROVED   DENIED   READY   ISSUER                                     REQUESTOR                                         AGE
opentelemetry-operator-serving-cert-nddc5   True                True    opentelemetry-operator-selfsigned-issuer   system:serviceaccount:cert-manager:cert-manager   19h

And if I do the exact same setup in minikube it’s all fine. I’ll keep digging.