kubernetes: Node authorizer fails to authorize requests right after kube-apiserver restart

What happened:

In kubernetes clusters with O(100k) pods, the initialization of watch cache and shared informer inside of kube-apiserver takes 30-50 seconds. Before this happens, node authorizer is not able to accept any request. When this happens, kubelets are retrying the requests leading to a retry loop and generating a significant load on the master which in effect can significantly slow down a kube-apiserver initialization, ultimately leading to 429 errors and kube-apiserver crashloop.

The information about state of node authorizer (if it’s initialized or not) is not available externally and cannot be used by e.g. load balancer to direct traffic instances that are already initialized.

What you expected to happen:

We should have a way to determine if the kube-apiserver is ready to serve requests. Currently /healthz, /livez or /readyz endpoints don’t export any information that can be used to determine state of node authorizer.

In multimaster environment, when one of the kube-apiserver replicas is not yet initialized, we should redirect traffic to other replicas, but currently there is no signal we can use to determine if the replica is initialized or not.

How to reproduce it (as minimally and precisely as possible):

Run scalability load test in 3000 node scale. Right before “Scaling objects” step, restart all masters. They will fail to reboot in ~minute window, so will be killed by livenessProbe.

Anything else we need to know?:

While in our repro we suffer from node authorizer not being initialized, this likely also affects other controllers working inside of kube-apiserver.

I suggest adding a poststarthook blocking /healthz that will expose the status of all informers in the system. This way we can use that information in load balancer to direct traffic only to already initialized instances. I will send PR for this.

Environment:

Kubernetes version (use kubectl version): tested in 1.17 and 1.18
Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Network plugin and version (if this is a network-related bug):
Others:

/assign @wojtek-t /assign @lavalamp /cc @mm4tt

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 16 (9 by maintainers)

Most upvoted comments

https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/auth/authorizer/node/node_authorizer.go is where the node authorizer lives

it makes decisions using a graph which is populated using informers

liggitt on Mar 11, 2022

I have recently hit this as well 😃

pmorie on Jul 30, 2021

While #92508 is universally needed imho, I think that eventually we will need a bit more than that.

In the ideal world, I would like to have a post-start-hook that will not only wait for informers to be synced, but in addition to that will wait for the initial state to actually be applied in node-authorizer state. What I mean by that is “ensuring that handlers for all objects from the “initial list” are actually called”. Currently - we don’t really have a way to answer this question unfortunately. I had this WIP PR from some time ago that would allow doing that: https://github.com/kubernetes/kubernetes/pull/73203 This case would be a perfect usecase for this PR. I would be willing to resurrect it and push that further (though probably not before 1.19 code-freeze), if we agree that it’s what we want. I still believe it actually is, but @liggitt - you were very opposed to it in the past. But I think we should get back to this discussion.

FTR - this topic was also mentioned here: https://github.com/kubernetes/kubernetes/issues/90339#issuecomment-617965564 and it seems that @lavalamp agreed that it might be useful. Once we’re past code-freeze, I would like to get back to it and get agreement on what we do with it (in addition to short-term thing that Maciek send-out).

wojtek-t on Jun 26, 2020