kubernetes: Admission controller fails on timeout when failurePolicy set to Ignore

Trying to set up a validating admission webhook on my GKE cluster using the following yaml:

apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
  name: test-admission-webhook
webhooks:
  - name: my-admission-webhook.io
    rules:
      - apiGroups:
          - ""
        apiVersions:
          - "v1"
        operations:
          - "CREATE"
        resources:
          - "pods"
    failurePolicy: Ignore
    clientConfig:
      url: "https://192.168.99.1:8080"
      caBundle: %%TEST_BUNDLE%%

And my server is still down, I can’t run any pod on my cluster, getting the following error:

Error creating: Timeout: request did not complete within allowed duration

Even when I removed the failurePolicy field from my yaml file (which is supposed to default to Ignore as mentioned in the official doc) and I’m getting the same error.

/sig api-machinery /kind bug

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 1
Comments: 19 (14 by maintainers)

Commits related to this issue

fix: Decrease default admission controller timeout If the timeout is >= the default client timeout of 30s, this can lead to the client believing the request failed. failurePolicy: Ignore still needs ... — committed to mikebryant/datadog-agent by mikebryant 3 years ago

Most upvoted comments

I think @yue9944882 and @liggitt were right. It’s your create request timing-out not your admission request

tl;dr: it’s behaving by design. But we shouldn’t use the same timeout for client request and admission request. You could do one of the following to fix:

configure timeout (> 30s) in your client request by setting Timeout in your restclient.Config (e.g. config here). It will change the timeout for all your client requests (per request timeout configuration is WIP)
configure timeout (< 30s) in your webhook server as @yue9944882 suggested
configure timeout (< 30s) for admission request using https://github.com/kubernetes/kubernetes/pull/74562 (it’s in 1.14, probably the least solution you want)

(longer version) I think what happened is:

there are two different requests, built on same client package with the same timeout

you client sends a create request to apiserver using client-go (which builds on rest client, and eventually on a http client). The http client has a timeout set for every request (I think the config is defaulted to 30s somewhere)
apiserver receives the create request from client and sends an admission request to webhook server, also using rest client with 30s timeout

since webhook server is unresponsive, both requests hang.

your client hits its timeout and returns error first
apiserver hits timeout talking to webhook server. It could have ignored the error based on the policy and created pod successfully, but the client has dropped alreay

(you can tell from the error message, it should contain the text “Internal error” if apiserver actually didn’t honor ignore policy and returned error)

roycaihw on Mar 6, 2019

the failurePolicy defines how we deal w/ responses returned from webhooks, while apiserver doesn’t receive anything if the webhook is not responding so the failurePolicy didn’t work

It should work. A failure policy of ignore should fail open on timeout or other call errors

liggitt on Nov 29, 2018