cockroach-operator: CrdbCluster tolerations and affinity not working
Description
Hi everyone! I’m trying to control where my cockroachdb pods are scheduled on my GKE cluster. I have 2 node pools and I wanted one of them to be used exclusively for cockroachdb. I’ve tried tainting the nodes and using tolerations
to only allow cockroachdb pods on them. Also added a nodeAffinity
on the CrdbCluster so cockroachdb pods are only scheduled on the aforementioned nodes.
Neither the affinity nor the tolerations seem to be working. The only tolerations I see on my cockroachdb pods are:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
And I see that cockroachdb pods are being scheduled on nodes from both of my node pools, so the affinity doesn’t seem to be working.
Steps to reproduce
- Apply the CRD like so:
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/config/crd/bases/crdb.cockroachlabs.com_crdbclusters.yaml
- Apply the operator (with the
AffinityRules
feature gate enabled). - Apply the CrdbCluster custom resource
Configuration & Manifest dump
Nodes
The nodepool is managed by terraform, the relevant configuration looks like this:
Label
labels = {
nodepool_service = "cockroachdb"
}
Taints
taint {
key = "reservation"
value = "cockroachdb"
effect = "NO_SCHEDULE"
}
Both of the above have been tested and work well.
Manifests
CrdbCluster
apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
name: cockroachdb
annotations:
crdb.io/restarttype: rolling
spec:
clientTLSSecret: cockroachdb.client.root
nodeTLSSecret: cockroachdb.node
dataStore:
pvc:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
volumeMode: Filesystem
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 2
memory: 8Gi
tlsEnabled: true
image:
name: cockroachdb/cockroach:v21.1.7
nodes: 4
additionalLabels:
crdb: is-cool
tolerations:
- key: "reservation"
operator: "Equal"
value: "cockroachdb"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodepool_service
operator: In
values:
- cockroachdb
Operator
# Copyright 2021 The Cockroach Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Generated, do not edit. Please edit this file instead: config/templates/operator.yaml.in
#
---
apiVersion: v1
kind: Namespace
metadata:
name: default
labels:
cockroach-namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cockroach-database-role
rules:
- verbs:
- use
apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
resourceNames:
- anyuid
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cockroach-database-sa
namespace: default
annotations:
labels:
app: cockroach-operator
---
# RBAC Definition (ClusterRole, ServiceAccount, and ClusterRoleBinding):
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cockroach-database-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cockroach-database-role
subjects:
- kind: ServiceAccount
name: cockroach-database-sa
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: cockroach-operator-role
rules:
- apiGroups:
- "*"
resources:
- "*"
verbs:
- "*"
---
# RBAC Definition (ClusterRole, ServiceAccount, and ClusterRoleBinding):
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cockroach-operator-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cockroach-operator-role
subjects:
- kind: ServiceAccount
name: cockroach-operator-sa
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cockroach-operator-role
rules:
- apiGroups:
- "*"
resources:
- "*"
verbs:
- "*"
- apiGroups:
- rbac.authorization.k8s.io
resources:
- clusterroles
verbs:
- get
- list
- delete
- apiGroups:
- apps
resources:
- statefulsets
verbs:
- "*"
- apiGroups:
- apps
resources:
- statefulsets/finalizers
verbs:
- "*"
- apiGroups:
- apps
resources:
- statefulsets/status
verbs:
- "*"
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- "*"
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests/approval
verbs:
- "*"
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests/status
verbs:
- "*"
- apiGroups:
- ""
resources:
- configmaps
verbs:
- "*"
- apiGroups:
- ""
resources:
- nodes
verbs:
- "get"
- apiGroups:
- ""
resources:
- configmaps/status
verbs:
- "*"
- apiGroups:
- ""
resources:
- pods/exec
verbs:
- "*"
- apiGroups:
- ""
resources:
- secrets
verbs:
- "*"
- apiGroups:
- ""
resources:
- services
verbs:
- "*"
- apiGroups:
- ""
resources:
- services/finalizers
verbs:
- "*"
- apiGroups:
- ""
resources:
- services/status
verbs:
- "*"
- apiGroups:
- crdb.cockroachlabs.com
resources:
- crdbclusters
verbs:
- "*"
- apiGroups:
- crdb.cockroachlabs.com
resources:
- crdbclusters/status
verbs:
- "*"
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- "*"
- apiGroups:
- policy
resources:
- poddisruptionbudgets/finalizers
verbs:
- "*"
- apiGroups:
- policy
resources:
- poddisruptionbudgets/status
verbs:
- "*"
- verbs:
- use
apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
resourceNames:
- nonroot
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cockroach-operator-sa
namespace: default
annotations:
labels:
app: cockroach-operator
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cockroach-operator-default
labels:
app: cockroach-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cockroach-operator-role
subjects:
- name: cockroach-operator-sa
namespace: default
kind: ServiceAccount
# Operator Deployment Definition:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cockroach-operator
namespace: default
labels:
app: cockroach-operator
spec:
replicas: 1
selector:
matchLabels:
app: cockroach-operator
template:
metadata:
labels:
app: cockroach-operator
spec:
serviceAccountName: cockroach-operator-sa
containers:
- name: cockroach-operator
image: cockroachdb/cockroach-operator:v2.1.0
imagePullPolicy: IfNotPresent
# new alpha features are disabled via feature gates
# uncomment the feature-gates argument to enable the feature
args:
- feature-gates
# - AutoPrunePVC=true
- AffinityRules=true
# the below log level accepts "info" "debug" "warn" or "error"
- -zap-log-level
- info
# - debug
env:
- name: WATCH_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: OPERATOR_NAME
value: cockroachdb
- name: RELATED_IMAGE_COCKROACH_v20_1_4
value: cockroachdb/cockroach:v20.1.4
- name: RELATED_IMAGE_COCKROACH_v20_1_5
value: cockroachdb/cockroach:v20.1.5
- name: RELATED_IMAGE_COCKROACH_v20_1_8
value: cockroachdb/cockroach:v20.1.8
- name: RELATED_IMAGE_COCKROACH_v20_1_11
value: cockroachdb/cockroach:v20.1.11
- name: RELATED_IMAGE_COCKROACH_v20_1_12
value: cockroachdb/cockroach:v20.1.12
- name: RELATED_IMAGE_COCKROACH_v20_1_13
value: cockroachdb/cockroach:v20.1.13
- name: RELATED_IMAGE_COCKROACH_v20_1_15
value: cockroachdb/cockroach:v20.1.15
- name: RELATED_IMAGE_COCKROACH_v20_1_16
value: cockroachdb/cockroach:v20.1.16
- name: RELATED_IMAGE_COCKROACH_v20_1_17
value: cockroachdb/cockroach:v20.1.17
- name: RELATED_IMAGE_COCKROACH_v20_2_0
value: cockroachdb/cockroach:v20.2.0
- name: RELATED_IMAGE_COCKROACH_v20_2_1
value: cockroachdb/cockroach:v20.2.1
- name: RELATED_IMAGE_COCKROACH_v20_2_2
value: cockroachdb/cockroach:v20.2.2
- name: RELATED_IMAGE_COCKROACH_v20_2_3
value: cockroachdb/cockroach:v20.2.3
- name: RELATED_IMAGE_COCKROACH_v20_2_4
value: cockroachdb/cockroach:v20.2.4
- name: RELATED_IMAGE_COCKROACH_v20_2_5
value: cockroachdb/cockroach:v20.2.5
- name: RELATED_IMAGE_COCKROACH_v20_2_6
value: cockroachdb/cockroach:v20.2.6
- name: RELATED_IMAGE_COCKROACH_v20_2_8
value: cockroachdb/cockroach:v20.2.8
- name: RELATED_IMAGE_COCKROACH_v20_2_9
value: cockroachdb/cockroach:v20.2.9
- name: RELATED_IMAGE_COCKROACH_v20_2_10
value: cockroachdb/cockroach:v20.2.10
- name: RELATED_IMAGE_COCKROACH_v20_2_11
value: cockroachdb/cockroach:v20.2.11
- name: RELATED_IMAGE_COCKROACH_v20_2_12
value: cockroachdb/cockroach:v20.2.12
- name: RELATED_IMAGE_COCKROACH_v20_2_13
value: cockroachdb/cockroach:v20.2.13
- name: RELATED_IMAGE_COCKROACH_v20_2_14
value: cockroachdb/cockroach:v20.2.14
- name: RELATED_IMAGE_COCKROACH_v20_2_15
value: cockroachdb/cockroach:v20.2.15
- name: RELATED_IMAGE_COCKROACH_v21_1_0
value: cockroachdb/cockroach:v21.1.0
- name: RELATED_IMAGE_COCKROACH_v21_1_1
value: cockroachdb/cockroach:v21.1.1
- name: RELATED_IMAGE_COCKROACH_v21_1_2
value: cockroachdb/cockroach:v21.1.2
- name: RELATED_IMAGE_COCKROACH_v21_1_3
value: cockroachdb/cockroach:v21.1.3
- name: RELATED_IMAGE_COCKROACH_v21_1_4
value: cockroachdb/cockroach:v21.1.4
- name: RELATED_IMAGE_COCKROACH_v21_1_5
value: cockroachdb/cockroach:v21.1.5
- name: RELATED_IMAGE_COCKROACH_v21_1_6
value: cockroachdb/cockroach:v21.1.6
- name: RELATED_IMAGE_COCKROACH_v21_1_7
value: cockroachdb/cockroach:v21.1.7
resources:
requests:
cpu: 10m
memory: 32Mi
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 28 (6 by maintainers)
Got it figured out, the issue was, indeed, networking related.
For anyone else running into this on GKE, these 2 links should help you out: https://github.com/kubernetes/kubernetes/issues/79739 https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules
TL;DR You need to add an INGRESS firewall rule that targets the k8s nodes and allows traffic originated from the control plane CIDR on port 9443. If you’re using terraform, you might find it difficult to get the network tag required for the firewall rule. This should help: https://github.com/hashicorp/terraform-provider-google/issues/5939#issuecomment-604411328
Forgot to mention, the operator config I’ve initially posted is broken. The feature-gates line should have been
- -feature-gates
. With this change, affinities work just fine on v2.1.0.Yes affinity, tolerations, and nodeSelector are all now working (as of v2.3.0). Node selectors work out of the box, but affinity and tolerations are behind feature gates which you’ll need to enable with the command args. E.g.