kubernetes: ResourceVersion="0" causes no chunking, makes starting Informers expensive

I’m seeing issues where API servers experience high load when Informers start up. I think I’ve traced it to this issue https://github.com/kubernetes/client-go/issues/366 where it was intentionally decided that chunking would not be peformed when ResourceVersion=“0”. That decision dates from 2018 and was made because it was not possible to provide the correcct consistency guarantee (compatible with ResourceVersion=“0”) in a paged situation.

Is any better solution possible now?

For instance, is it possible to leverage the fact that, in recent versions “The metadata.resourceVersion of a resource collection (i.e. a list response) identifies the resource version at which the list response was constructed.” So would it be possible to leverage that to allow real paging? e.g.

(a) client requests ResourceVersion=“0” (b) server computes resourceVersion of the list it is about to serve (i.e. “resource version at which the list was constructed”). This is the bit that I’m not sure about. Can it compute that RV before it starts writing the response headers? © server does real chunking, and returns a continue token based on (b). (Since as noted in the 2018 GitHub issue, it’s not possible to base a continue token on (a).)

How to reproduce it (as minimally and precisely as possible):

Use the Go SDK to create and Run an Informer for some kind object that you have many thousands of.

Anything else we need to know?:

Just a few quick notes on the impact of this:

Firstly, there can be real impact on server resource consumption - especially if many different clients start Informers on large lists at the same time.

Secondly, the current behaviour is confusing. It doesn’t seem to be documented anywhere outside a few GitHub comments.

Thirdly, it doesn’t seem to be consistent with the intention of the original KEP. For instance, the KEP says this

A snapshot of all resources at a particular moment in time that has a single resourceVersion that clients can begin watching from to receive updates.

And yet, in realiting, when I client wants to “begin watching” that’s currently the one time that chunking is NOT supported (due to this issue with ResourceVersion=“0”).

So it seems that a fundamental goal of the KEP is not being met by the current behaviour.

See also https://github.com/kubernetes/kubernetes/issues/96497 , which is about making the current beta behaviour GA.

Environment:

  • Kubernetes version (use kubectl version): v1.19.7
  • Cloud provider or hardware configuration: AKS

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 26 (10 by maintainers)

Most upvoted comments

Thanks for the detailed reply @wojtek-t . The issues you point out make sense.

I’m left with a couple of concerns about the current direction, which I’d like to share with you. I’m not saying they’re show-stoppers, but they do seem like the might be relevant.

My first concern is about the advice that would then be given to developers (e.g. of 3rd-party things that will run in K8s). If I understand correctly, the advice to devs would become:

  1. If you need a “list”, your first preference should be to use an Informer because it will do streaming for you.
  2. If you can’t or don’t use an informer a. Lists that hit the API server cache may be strongly throttled. b. Paginated lists that force querying from etcd will be less throttled but you shouldn’t use these because they put too much load on etcd.

I’m a little concerned about the conflict between 2a and 2b. Devs don’t want to be throttled, but we don’t want them to hit etcd. I feel devs may be disappointed and/or confused by that advice. Or may be “incentivised” to force the queries through to etcd, to avoid the throttling.

My second concern is that even a very small number of unpaginated list queries can cause a non-trivial RAM spike if the lists have 10s of thousands of elements. How sure are we that we can get decent control of RAM spikes by throttling non-paginated requests with P&F?

Thank you Wojciech for all your comments and knowledge! /cc @yliaog /triage accepted

Am I right in assuming that that it’s intended to work even for lists as large as 10’s of thousands of objects - e.g. 50,000 items?

yes

If so, you’re welcome to close this issue now. (Unless you need it for tracking documentation updates, as discussed above).

yes - we can leave it open for the documentation update

Behavior of what?

The behaviour that limit is ignored by the server if ResourceVersion=0. If that’s the long-term plan, I think it has to be documented in the docomentation that describes limit \ continue otherwise folks will keep wasting time trying to figure out what’s happening (like I did, and like the author of the linked issue probably did)

Thanks for your detailed comments on all the other points. I’ll digest them and reply later.