kubernetes: upgrading from v1.13 to v1.14 causes daemonset's container restart unexpectedly
What happened: I have a kubernetes cluster with version v1.10.2 and want to upgrade to v1.14.2 step by step, without the use of cluster lifecycle management tools like kubeadm. when upgrading from v1.13.2 to v1.14.2, some daemonset’s container restart unexpectedly.
A new controllerrevision was created at this time, with the only difference from the previous one was adding spec.containers[0].securityContext.procMount: Default.
securityContext.procMount feature was added in v1.12 by #64283, but it’s weird to have an impact at version v1.14.
What you expected to happen: do not restart daemonset’s container when upgrading
How to reproduce it (as minimally and precisely as possible):
- create a daemonset:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
annotations:
labels:
name: nvidia-device-plugin-ds
name: nvidia-device-plugin-daemonset
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
name: nvidia-device-plugin-ds
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
name: nvidia-device-plugin-ds
spec:
containers:
- image: nvidia/k8s-device-plugin:1.11
imagePullPolicy: IfNotPresent
name: nvidia-device-plugin-ctr
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/kubelet/device-plugins
name: device-plugin
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
volumes:
- hostPath:
path: /var/lib/kubelet/device-plugins
type: ""
name: device-plugin
- upgrade kubernetes component from v1.10.2 (or v1.11.2) to v1.14.2
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version): - Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release): - Kernel (e.g.
uname -a): - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (17 by maintainers)
fixes opened for 1.12, 1.13, 1.14, and master
that is true, but the default applies to a child field of non-pod objects as well, so it is conceivable a ReplicaSet or Deployment could have remained untouched since before 1.12
/reopen /assign
This was likely triggered by https://github.com/kubernetes/kubernetes/pull/72213/files#r243086103 - there was a bug in the way defaulting was applied to the field when it was added in v1.12 in https://github.com/kubernetes/kubernetes/pull/64283
However, the fact that a new defaulted field triggers a restart of a daemonset is concerning, and indicates the underlying change detection strategy used by the daemonset controller is fragile. It seems similar to https://github.com/kubernetes/kubernetes/issues/57167 in that controllers are assuming they can hash or deepequal compare parent objects and spawned pods. Even if the new field had been correctly defaulted in v1.12, it appears that would have triggered restarts of the DaemonSet pods in that release.