operator: The `controlPlaneNodeSelector` installation field doesn't effect Typha

Expected Behavior

Based on the documentation for controlPlaneNodeSelector it applies to all components which aren’t DaemonSets. That means that it should apply to the Typha deployment.

Current Behavior

The controlPlaneNodeSelector doesn’t apply to the Typha deployment. I suspect this might be because there is the typhaAffinity field, but affinity and node selectors can be used in parallel.

Possible Solution

Use controlPlaneNodeSelector with the Typha deployment.

Steps to Reproduce (for bugs)

n/a

Context

n/a

Your Environment

n/a

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 18 (14 by maintainers)

Most upvoted comments

I think we should fill this out:

  • typhaAffinity
  • typhaNodeSelector
  • controlPlaneAffinity
  • controlPlaneSelector
  • daemonsetAffinity
  • daemonsetNodeSelector

Consistent and covers all of the bases 😅

@caseydavenport I agree that “Add component-specific fields for Typha and calico/node.” is the best approach from an end user aspect. Is it worth considering if Calico node needs affinity or customisable selectors? Also Calico Node configuration could use the daemonset prefix, e.g. daemonsetTolerations.

Slightly related, could you point me at the documentation that defines the data plane and control plane? I incorrectly assumed that Typha was part of the control plane; mainly because it’s usually only the daemonsets that are considered data plane, but also because my understanding of Typha is that it caches the K8s API and I’d consider the K8s API to be more control plane than data plane (possibly incorrectly).

Hey @stevehipwell @aquam8 @sarthakjain271095 and @aarondav,

We’ve put up an outline of proposed changes to operator component configuration. Among other things, this will allow overriding tolerations and node affinity/nodeSelectors. Please take a look if you can. We’d appreciate your input on the proposed changes: https://github.com/tigera/operator/issues/1990

That would work too! The one caveat we’ve heard of around affinity is that it is much more expensive for the scheduler to enforce compared to nodeSelectors, which could limit the size of the Kubernetes cluster in terms of number of pods (around 10k pods per cluster). But we haven’t hit that limit in our use-case yet.

Yeah that was a bad decision to have the two controlPlane* configs not apply to the same set of components

Agreed

should we create a typhaTolerations so the controlPlane ones are consistent in where they apply?

I think the options here are:

  • Consider Typha to be controlPlane, and thus have all the controlPlaneX fields apply to it.
  • Consider it to be dataPlane, add new dataPlaneX fields that apply to Typha and calico/node.
  • Add component-specific fields for Typha and calico/node.

I think the latter is probably the right path forward. controlPlane makes sense for controllers and such that are not critical path for applications functioning (kube-controllers, apiserver, etc). Those can be bunched up.

calico/node and calico/typha are, unfortunately but necessarily, special system components that require fine-tuning.

So, for typha I think we should have:

  • typhaNodeSelector
  • typhaAffinity
  • typhaTolerations

I’m not a huge fan of encoding component names into the API - I think it leaks implementation details, but in this case the implementation is part of the feature that is relevant to the end user, so there might be no way around that.

calico/node is even more awkward, because it is named pretty vaguely…

  • calicoNodeNodeSelector
  • calicoNodeAffinity
  • calicoNodeTolerations

^ These all seem non-obvious for the new user - e.g., is it “CalicoNode affinity or Calico NodeAffinity”? I think better names are needed for those.

Yeah that was a bad decision to have the two controlPlane* configs not apply to the same set of components. @caseydavenport WDYT should we create a typhaTolerations so the controlPlane ones are consistent in where they apply?

Also for a simple node selector typhaAffinity is overkill and adds significant cognitive load.

I expect the user to take on that cognitive load because it should be a specific decision if they need to use affinity for typha. We are talking about a component that if it cannot be deployed then pod networking will not function in a cluster, so if someone wants Node Selector type behavior for typha, it should not be an easy or quick decision.

@tmjd I’ve configured typhaAffinity but as Typha uses the controlPlaneTolerations value it really doesn’t make sense that it doesn’t use the controlPlaneNodeSelector value. Also for a simple node selector typhaAffinity is overkill and adds significant cognitive load.

When building Kubernetes platforms it’s really important to have control over scheduling decisions for central components; this is where a lack of flexibility in operators can make them un-usable. It’s a common pattern to run system node pools to rull all the central components on, leaving user provisioned nodes to only run daemonsets and user workloads.