cockroach-operator: CrdbCluster tolerations and affinity not working

Description

Hi everyone! I’m trying to control where my cockroachdb pods are scheduled on my GKE cluster. I have 2 node pools and I wanted one of them to be used exclusively for cockroachdb. I’ve tried tainting the nodes and using tolerations to only allow cockroachdb pods on them. Also added a nodeAffinity on the CrdbCluster so cockroachdb pods are only scheduled on the aforementioned nodes. Neither the affinity nor the tolerations seem to be working. The only tolerations I see on my cockroachdb pods are:

Tolerations:             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                         node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

And I see that cockroachdb pods are being scheduled on nodes from both of my node pools, so the affinity doesn’t seem to be working.

Steps to reproduce

  • Apply the CRD like so: kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/config/crd/bases/crdb.cockroachlabs.com_crdbclusters.yaml
  • Apply the operator (with the AffinityRules feature gate enabled).
  • Apply the CrdbCluster custom resource

Configuration & Manifest dump

Nodes

The nodepool is managed by terraform, the relevant configuration looks like this:

Label

    labels = {
      nodepool_service = "cockroachdb"
    }

Taints

    taint {
      key    = "reservation"
      value  = "cockroachdb"
      effect = "NO_SCHEDULE"
    }

Both of the above have been tested and work well.

Manifests

CrdbCluster

apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
  name: cockroachdb
  annotations:
    crdb.io/restarttype: rolling
spec:
  clientTLSSecret: cockroachdb.client.root
  nodeTLSSecret: cockroachdb.node
  dataStore:
    pvc:
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: "2Gi"
        volumeMode: Filesystem
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      cpu: 2
      memory: 8Gi
  tlsEnabled: true
  image:
    name: cockroachdb/cockroach:v21.1.7
  nodes: 4
  additionalLabels:
    crdb: is-cool
  tolerations:
    - key: "reservation"
      operator: "Equal"
      value: "cockroachdb"
      effect: "NoSchedule"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: nodepool_service
            operator: In
            values:
            - cockroachdb

Operator

# Copyright 2021 The Cockroach Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Generated, do not edit. Please edit this file instead: config/templates/operator.yaml.in
#
---
apiVersion: v1
kind: Namespace
metadata:
  name: default
  labels:
    cockroach-namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cockroach-database-role
rules:
  - verbs:
      - use
    apiGroups:
      - security.openshift.io
    resources:
      - securitycontextconstraints
    resourceNames:
      - anyuid
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cockroach-database-sa
  namespace: default
  annotations:
  labels:
    app: cockroach-operator
---
# RBAC Definition (ClusterRole, ServiceAccount, and ClusterRoleBinding):
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cockroach-database-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cockroach-database-role
subjects:
  - kind: ServiceAccount
    name: cockroach-database-sa
    namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cockroach-operator-role
rules:
  - apiGroups:
      - "*"
    resources:
      - "*"
    verbs:
      - "*"
---
# RBAC Definition (ClusterRole, ServiceAccount, and ClusterRoleBinding):
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cockroach-operator-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cockroach-operator-role
subjects:
  - kind: ServiceAccount
    name: cockroach-operator-sa
    namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cockroach-operator-role
rules:
  - apiGroups:
      - "*"
    resources:
      - "*"
    verbs:
      - "*"
  - apiGroups:
      - rbac.authorization.k8s.io
    resources:
      - clusterroles
    verbs:
      - get
      - list
      - delete
  - apiGroups:
      - apps
    resources:
      - statefulsets
    verbs:
      - "*"
  - apiGroups:
      - apps
    resources:
      - statefulsets/finalizers
    verbs:
      - "*"
  - apiGroups:
      - apps
    resources:
      - statefulsets/status
    verbs:
      - "*"
  - apiGroups:
      - certificates.k8s.io
    resources:
      - certificatesigningrequests
    verbs:
      - "*"
  - apiGroups:
      - certificates.k8s.io
    resources:
      - certificatesigningrequests/approval
    verbs:
      - "*"
  - apiGroups:
      - certificates.k8s.io
    resources:
      - certificatesigningrequests/status
    verbs:
      - "*"
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - "*"
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - "get"
  - apiGroups:
      - ""
    resources:
      - configmaps/status
    verbs:
      - "*"
  - apiGroups:
      - ""
    resources:
      - pods/exec
    verbs:
      - "*"
  - apiGroups:
      - ""
    resources:
      - secrets
    verbs:
      - "*"
  - apiGroups:
      - ""
    resources:
      - services
    verbs:
      - "*"
  - apiGroups:
      - ""
    resources:
      - services/finalizers
    verbs:
      - "*"
  - apiGroups:
      - ""
    resources:
      - services/status
    verbs:
      - "*"
  - apiGroups:
      - crdb.cockroachlabs.com
    resources:
      - crdbclusters
    verbs:
      - "*"
  - apiGroups:
      - crdb.cockroachlabs.com
    resources:
      - crdbclusters/status
    verbs:
      - "*"
  - apiGroups:
      - policy
    resources:
      - poddisruptionbudgets
    verbs:
      - "*"
  - apiGroups:
      - policy
    resources:
      - poddisruptionbudgets/finalizers
    verbs:
      - "*"
  - apiGroups:
      - policy
    resources:
      - poddisruptionbudgets/status
    verbs:
      - "*"
  - verbs:
      - use
    apiGroups:
      - security.openshift.io
    resources:
      - securitycontextconstraints
    resourceNames:
      - nonroot
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cockroach-operator-sa
  namespace: default
  annotations:
  labels:
    app: cockroach-operator
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cockroach-operator-default
  labels:
    app: cockroach-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cockroach-operator-role
subjects:
  - name: cockroach-operator-sa
    namespace: default
    kind: ServiceAccount

# Operator Deployment Definition:
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cockroach-operator
  namespace: default
  labels:
    app: cockroach-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cockroach-operator
  template:
    metadata:
      labels:
        app: cockroach-operator
    spec:
      serviceAccountName: cockroach-operator-sa
      containers:
        - name: cockroach-operator
          image: cockroachdb/cockroach-operator:v2.1.0
          imagePullPolicy: IfNotPresent
          # new alpha features are disabled via feature gates
          # uncomment the feature-gates argument to enable the feature
          args:
            - feature-gates
            # - AutoPrunePVC=true
            - AffinityRules=true
            # the below log level accepts "info" "debug" "warn" or "error"
            - -zap-log-level
            - info
          # - debug
          env:
            - name: WATCH_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: OPERATOR_NAME
              value: cockroachdb
            - name: RELATED_IMAGE_COCKROACH_v20_1_4
              value: cockroachdb/cockroach:v20.1.4
            - name: RELATED_IMAGE_COCKROACH_v20_1_5
              value: cockroachdb/cockroach:v20.1.5
            - name: RELATED_IMAGE_COCKROACH_v20_1_8
              value: cockroachdb/cockroach:v20.1.8
            - name: RELATED_IMAGE_COCKROACH_v20_1_11
              value: cockroachdb/cockroach:v20.1.11
            - name: RELATED_IMAGE_COCKROACH_v20_1_12
              value: cockroachdb/cockroach:v20.1.12
            - name: RELATED_IMAGE_COCKROACH_v20_1_13
              value: cockroachdb/cockroach:v20.1.13
            - name: RELATED_IMAGE_COCKROACH_v20_1_15
              value: cockroachdb/cockroach:v20.1.15
            - name: RELATED_IMAGE_COCKROACH_v20_1_16
              value: cockroachdb/cockroach:v20.1.16
            - name: RELATED_IMAGE_COCKROACH_v20_1_17
              value: cockroachdb/cockroach:v20.1.17
            - name: RELATED_IMAGE_COCKROACH_v20_2_0
              value: cockroachdb/cockroach:v20.2.0
            - name: RELATED_IMAGE_COCKROACH_v20_2_1
              value: cockroachdb/cockroach:v20.2.1
            - name: RELATED_IMAGE_COCKROACH_v20_2_2
              value: cockroachdb/cockroach:v20.2.2
            - name: RELATED_IMAGE_COCKROACH_v20_2_3
              value: cockroachdb/cockroach:v20.2.3
            - name: RELATED_IMAGE_COCKROACH_v20_2_4
              value: cockroachdb/cockroach:v20.2.4
            - name: RELATED_IMAGE_COCKROACH_v20_2_5
              value: cockroachdb/cockroach:v20.2.5
            - name: RELATED_IMAGE_COCKROACH_v20_2_6
              value: cockroachdb/cockroach:v20.2.6
            - name: RELATED_IMAGE_COCKROACH_v20_2_8
              value: cockroachdb/cockroach:v20.2.8
            - name: RELATED_IMAGE_COCKROACH_v20_2_9
              value: cockroachdb/cockroach:v20.2.9
            - name: RELATED_IMAGE_COCKROACH_v20_2_10
              value: cockroachdb/cockroach:v20.2.10
            - name: RELATED_IMAGE_COCKROACH_v20_2_11
              value: cockroachdb/cockroach:v20.2.11
            - name: RELATED_IMAGE_COCKROACH_v20_2_12
              value: cockroachdb/cockroach:v20.2.12
            - name: RELATED_IMAGE_COCKROACH_v20_2_13
              value: cockroachdb/cockroach:v20.2.13
            - name: RELATED_IMAGE_COCKROACH_v20_2_14
              value: cockroachdb/cockroach:v20.2.14
            - name: RELATED_IMAGE_COCKROACH_v20_2_15
              value: cockroachdb/cockroach:v20.2.15
            - name: RELATED_IMAGE_COCKROACH_v21_1_0
              value: cockroachdb/cockroach:v21.1.0
            - name: RELATED_IMAGE_COCKROACH_v21_1_1
              value: cockroachdb/cockroach:v21.1.1
            - name: RELATED_IMAGE_COCKROACH_v21_1_2
              value: cockroachdb/cockroach:v21.1.2
            - name: RELATED_IMAGE_COCKROACH_v21_1_3
              value: cockroachdb/cockroach:v21.1.3
            - name: RELATED_IMAGE_COCKROACH_v21_1_4
              value: cockroachdb/cockroach:v21.1.4
            - name: RELATED_IMAGE_COCKROACH_v21_1_5
              value: cockroachdb/cockroach:v21.1.5
            - name: RELATED_IMAGE_COCKROACH_v21_1_6
              value: cockroachdb/cockroach:v21.1.6
            - name: RELATED_IMAGE_COCKROACH_v21_1_7
              value: cockroachdb/cockroach:v21.1.7
          resources:
            requests:
              cpu: 10m
              memory: 32Mi

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 28 (6 by maintainers)

Most upvoted comments

Got it figured out, the issue was, indeed, networking related.

For anyone else running into this on GKE, these 2 links should help you out: https://github.com/kubernetes/kubernetes/issues/79739 https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules

TL;DR You need to add an INGRESS firewall rule that targets the k8s nodes and allows traffic originated from the control plane CIDR on port 9443. If you’re using terraform, you might find it difficult to get the network tag required for the firewall rule. This should help: https://github.com/hashicorp/terraform-provider-google/issues/5939#issuecomment-604411328

Forgot to mention, the operator config I’ve initially posted is broken. The feature-gates line should have been - -feature-gates. With this change, affinities work just fine on v2.1.0.

Yes affinity, tolerations, and nodeSelector are all now working (as of v2.3.0). Node selectors work out of the box, but affinity and tolerations are behind feature gates which you’ll need to enable with the command args. E.g.

...
args:
  - -feature-gates
  - TolerationRules=true,AffinityRules=true
...