pkg: webhook EOF errors
/area API /area test-and-release /kind bug
Expected Behavior
When we run our e2e tests with chaos there are no failures due to the webhook shutting down.
Actual Behavior
We intermittently see failures like this: Post https://eventing-webhook.knative-eventing-qh1fjbnng8.svc:443/resource-conversion?timeout=30s: EOF ever since we enabled webhook chaos.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 46 (46 by maintainers)
Commits related to this issue
- Disable keep-alives on shutdown. See also: https://github.com/knative/pkg/issues/1509#issuecomment-659737054 — committed to mattmoor/pkg by mattmoor 4 years ago
- Disable keep-alives on shutdown. (#1511) See also: https://github.com/knative/pkg/issues/1509#issuecomment-659737054 — committed to knative/pkg by mattmoor 4 years ago
- Implement a new shared "Drainer" handler. This implements a new `http.Handler` called `Drainer`, which is intended to wrap some inner `http.Handler` business logic with a new outer handler that can r... — committed to mattmoor/pkg by mattmoor 4 years ago
- Implement a new shared "Drainer" handler. This implements a new `http.Handler` called `Drainer`, which is intended to wrap some inner `http.Handler` business logic with a new outer handler that can r... — committed to mattmoor/pkg by mattmoor 4 years ago
- Implement a new shared "Drainer" handler. (#1517) * Implement a new shared "Drainer" handler. This implements a new `http.Handler` called `Drainer`, which is intended to wrap some inner `http.Handle... — committed to knative/pkg by mattmoor 4 years ago
- Pull in the new fancier webhook drain from knative/pkg. This is attempting to try and combat the webhook Post EOF errors we have been seeing intermittently: https://github.com/knative/pkg/issues/1509 — committed to mattmoor/eventing by mattmoor 4 years ago
- Pull in the new fancier webhook drain from knative/pkg. (#3634) This is attempting to try and combat the webhook Post EOF errors we have been seeing intermittently: https://github.com/knative/pkg/iss... — committed to knative/eventing by mattmoor 4 years ago
- Disable chaos on webhook for now. Chaos on webhook currently causes a battery of failures tracked in https://github.com/knative/pkg/issues/1509. I think we should disable it to not hide issues and/or... — committed to markusthoemmes/knative-serving by markusthoemmes 4 years ago
- Disable chaos on webhook for now. (#8772) * Disable chaos on webhook for now. Chaos on webhook currently causes a battery of failures tracked in https://github.com/knative/pkg/issues/1509. I think w... — committed to knative/serving by markusthoemmes 4 years ago
- Disable chaosduck on the webhook (#5419) Serving has done this as well, the chaosduck killing the webhook causes errors being tracked in https://github.com/knative/pkg/issues/1509 — committed to knative/eventing by deleted user 3 years ago
- Retry on Webhook EOF Errors Mitigation for https://github.com/knative/pkg/issues/1509. Same fix was used in eventing core to mitigate webhook EOF errors. Signed-off-by: Pierangelo Di Pilato <pdipil... — committed to pierDipi/eventing-kafka by pierDipi 3 years ago
- Retry on Webhook EOF Errors (#978) Mitigation for https://github.com/knative/pkg/issues/1509. Same fix was used in eventing core to mitigate webhook EOF errors. Signed-off-by: Pierangelo Di Pilato ... — committed to knative-extensions/eventing-kafka by pierDipi 3 years ago
- Retry on Webhook EOF Errors Mitigation for https://github.com/knative/pkg/issues/1509. Same fix was used in eventing core to mitigate webhook EOF errors. Signed-off-by: Pierangelo Di Pilato <pdipil... — committed to pierDipi/eventing-kafka by pierDipi 3 years ago
- Retry on Webhook EOF Errors Mitigation for https://github.com/knative/pkg/issues/1509. Same fix was used in eventing core to mitigate webhook EOF errors. Signed-off-by: Pierangelo Di Pilato <pdipil... — committed to pierDipi/eventing-kafka by pierDipi 3 years ago
- Retry on Webhook EOF Errors Mitigation for https://github.com/knative/pkg/issues/1509. Same fix was used in eventing core to mitigate webhook EOF errors. Signed-off-by: Pierangelo Di Pilato <pdipil... — committed to pierDipi/eventing-kafka by pierDipi 3 years ago
- Retry webhook EOF errors This test sometimes fails due to [1]. [1] https://github.com/knative/pkg/issues/1509 Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> — committed to pierDipi/eventing-kafka-broker by pierDipi 2 years ago
- Retry webhook EOF errors This test sometimes fails due to [1]. [1] https://github.com/knative/pkg/issues/1509 Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> — committed to pierDipi/eventing-kafka-broker by pierDipi 2 years ago
- Retry webhook EOF errors (#2484) This test sometimes fails due to [1]. [1] https://github.com/knative/pkg/issues/1509 Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> — committed to knative-extensions/eventing-kafka-broker by pierDipi 2 years ago
/assign I’ll look next week to see what it is.
If dry-run is implicated but not well understood, we should move the feature to
Disabled. With Beta features the standard should be to roll-back first and ask questions later.As part of https://github.com/knative/serving/issues/11225 I encountered EOF’s/context deadline exceeded. After adding some tracing I’ve seen our web hooks respond <10ms but then the API server still returns a timeout.
So I don’t think this is isolated to just our web hooks.
My next steps is to start testing with a non-managed k8s service to be able to get API server logs
Interestingly one way to reproduce this consistently is to panic in the webhook when handling a request (ie. the defaulting logic)
golang’s http server recovers these panics and logs an error.
We should potentially recover ourselves so we can return an ‘internal server’ type error