dapr: Make Sidecar injection resilient in case of Kubernetes API interruptions / admissions webhook failures etc

In what area(s)?

/area injector /area sidecar-injector

Describe the feature

Dapr Sidecar Injector is responsible for handling the creation of Dapr sidecars and injecting the Dapr Trust Bundle secret from the Dapr-System namespace. Internally this approach uses the Kubernetes MutatingAdmissionWebhook feature.

On occasion the MutatingAdmissionRequest to the webhook which is registered to the Dapr Sidecar injector (see kubectl get MutatingWebhookConfiguration/dapr-sidecar-injector -oyaml) may not occur or may fail. At this time Dapr Helm Charts are configured to resume with the deployment in that case. This means new application deployments with Dapr annotations that should have sidecars injected will have missing sidecars.

The Dapr control plane should periodically (or upon Kubernetes API failure) determine whether any existing deployments need sidecars injected and whether new components must be loaded. The Dapr control plane also needs to gracefully resume watching of resource deployments when the Kubernetes API is available again.

Release Note

RELEASE NOTE:

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 19 (16 by maintainers)

Most upvoted comments

@gunniwho that makes sense. Looks like there’s no way to restrict to pods with a certain annotation, so what you’re saying is a real risk.

@ItalyPaleAle we learned it the hard way the other day (albeit with another sidecar injector) 😅

I think the underlying issue for this problem (which surely isn’t specific to dapr at all) is the mutating admission webooks design in kubernetes. The idea of calling an API at the time a pod is to be scheduled doesn’t really rhyme well with the desired-state-reconciliation-philosophy of kubernetes if you think about it. There should be a reconciliation loop calling the registered webooks periodically to check if the state of the pod is the desired state. This could probably just be the replica set controller.

Anyway, I digress and I might be wrong. It’s just a thought that occurred to me 🙂

I’m going to look into this issue and we’ll try to get a fix into 1.8