rancher: Bug: coredns-autoscaler needs readiness and liveness check

What kind of request is this (question/bug/enhancement/feature request): Bug

Steps to reproduce (least amount of steps as possible): coredns autoscaler does not have a readiness and liveness check

Result: If a node goes offline where coredns-autoscaler is running, it will not be rescheduled properly.

Other details that may be helpful:

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI):
  • Installation option (single install/HA): HA

gz#7893 SURE-1649

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (14 by maintainers)

Most upvoted comments

Rancher Server:

  • v2.6-head (39befda)
  • docker install

Validation steps followed as mentioned here :

  • Provisioned a single node AWS node driver k8s v1.22.6-rancher1-2 cluster.
  • Checked the coredns-autoscaler heatlh check fields and Readiness and Liveness Check are configured correctly.

coredns-ss1

Result:

  • The Health Check fields values are configured as expected, hence closing the issue.

Re-opening this issue Validated this on 2.6-head(6352861), but due to a UI issue cannot validate the Health Check fields. Logged the ticket https://github.com/rancher/dashboard/issues/5215.

PR reopened for 2.6.4

I also added a backport issue and PR for 2.5

Both PRs will wait to merge until the upcoming KDM release is done.

Whoever grabs this, we don’t expect the liveness check to actually correct the expected behavior to have the pod rescheduled after the node goes down. In 2.5.4 we added in the ability to configure the tolerations on add-ons which may solve this issue but needs to be investigated.