pkg: webhook EOF errors

/area API /area test-and-release /kind bug

Expected Behavior

When we run our e2e tests with chaos there are no failures due to the webhook shutting down.

Actual Behavior

We intermittently see failures like this: Post https://eventing-webhook.knative-eventing-qh1fjbnng8.svc:443/resource-conversion?timeout=30s: EOF ever since we enabled webhook chaos.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 46 (46 by maintainers)

Commits related to this issue

Most upvoted comments

/assign I’ll look next week to see what it is.

If dry-run is implicated but not well understood, we should move the feature to Disabled. With Beta features the standard should be to roll-back first and ask questions later.

As part of https://github.com/knative/serving/issues/11225 I encountered EOF’s/context deadline exceeded. After adding some tracing I’ve seen our web hooks respond <10ms but then the API server still returns a timeout.

So I don’t think this is isolated to just our web hooks.

My next steps is to start testing with a non-managed k8s service to be able to get API server logs

Interestingly one way to reproduce this consistently is to panic in the webhook when handling a request (ie. the defaulting logic)

golang’s http server recovers these panics and logs an error.

We should potentially recover ourselves so we can return an ‘internal server’ type error