kubernetes: Rolling upgrade of deployment has conflict with pod anti-affinity policy
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened: I have a kube-dns deployment to create multiple dns pods on each node. Use pod anti-affinity policy to make sure dns pods can be spreaded over cluster nodes.
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- kube-dns
topologyKey: "kubernetes.io/hostname"
containers:
- name: kubedns
Then I want to upgrade my kube-dns to the latest version 1.14.4
. The new pod is in Pending
state forever because of no node is selected for the pod because of pod anti-affinity policy.
What you expected to happen: The rolling upgrade process of kube-dns deployment should be completed as expected.
How to reproduce it (as minimally and precisely as possible):
- Create deployment with pod anti-affinity set.
- Upgrade the deployment.
Anything else we need to know?:
Environment: None
- Kubernetes version (use
kubectl version
): v1.8.3 - Cloud provider or hardware configuration: None
- OS (e.g. from /etc/os-release): Ubuntu 16.04
- Kernel (e.g.
uname -a
): - Install tools: self-defined
- Others:
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 15
- Comments: 21 (5 by maintainers)
It would be good if k8s was aware of the difference between new & old versions when considering preferred scheduling as part of a rolling deployment. It doesn’t do much good now when cluster size ~= number of pod replicas.
I am having same problem with preferredDuringSchedulingIgnoredDuringExecution. After rolling update of a deployment, podaffinity is not longer respected. I want deployment to be spread across many nodes (not necessarily one pod per node) to achieve high-availability. Rolling update skewes pod anti affinity in such a way that sometimes I end up having all pods scheduled on one node. This invalidates one of the use-cases of pod anti affinity - high availability with rolling updates.
My deployment (helm chart):
ReplicaCount is a variable and can be changed from deployment to deployment.
@k82cn I have tried the trick with maxUnavailable: 1 and maxSurge: 0 but it doesn’t work as expected. still ending up in non-evenly spread pods.
Did you try
strategy. rollingUpdate.maxUnavailable: 1
? Kill pod first when doing rolling upgrade./sig apps
/kind feature
This can be workaround by adding another label to the deployment file and use the new added label to do anti-affinity.
@ErikLundJensen you speak here about different regions… we need to understand what can be done if you have a simple case of 1 region, when the anti-affinity is conflicting the rolling upgrade deployment