kubernetes: upgrading from v1.13 to v1.14 causes daemonset's container restart unexpectedly

What happened: I have a kubernetes cluster with version v1.10.2 and want to upgrade to v1.14.2 step by step, without the use of cluster lifecycle management tools like kubeadm. when upgrading from v1.13.2 to v1.14.2, some daemonset’s container restart unexpectedly.

A new controllerrevision was created at this time, with the only difference from the previous one was adding spec.containers[0].securityContext.procMount: Default.

securityContext.procMount feature was added in v1.12 by #64283, but it’s weird to have an impact at version v1.14.

What you expected to happen: do not restart daemonset’s container when upgrading

How to reproduce it (as minimally and precisely as possible):

create a daemonset:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  annotations:
  labels:
    name: nvidia-device-plugin-ds
  name: nvidia-device-plugin-daemonset
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      creationTimestamp: null
      labels:
        name: nvidia-device-plugin-ds
    spec:
      containers:
      - image: nvidia/k8s-device-plugin:1.11
        imagePullPolicy: IfNotPresent
        name: nvidia-device-plugin-ctr
        resources: {}
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/kubelet/device-plugins
          name: device-plugin
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoSchedule
        key: nvidia.com/gpu
        operator: Exists
      volumes:
      - hostPath:
          path: /var/lib/kubelet/device-plugins
          type: ""
        name: device-plugin

upgrade kubernetes component from v1.10.2 (or v1.11.2) to v1.14.2

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Network plugin and version (if this is a network-related bug):
Others:

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 17 (17 by maintainers)

Most upvoted comments

fixes opened for 1.12, 1.13, 1.14, and master

liggitt on Jun 11, 2019

there should be no remaining pods that were originally created against a pre-1.12 API server.

that is true, but the default applies to a child field of non-pod objects as well, so it is conceivable a ReplicaSet or Deployment could have remained untouched since before 1.12

/reopen /assign

liggitt on Jun 10, 2019

This was likely triggered by https://github.com/kubernetes/kubernetes/pull/72213/files#r243086103 - there was a bug in the way defaulting was applied to the field when it was added in v1.12 in https://github.com/kubernetes/kubernetes/pull/64283

However, the fact that a new defaulted field triggers a restart of a daemonset is concerning, and indicates the underlying change detection strategy used by the daemonset controller is fragile. It seems similar to https://github.com/kubernetes/kubernetes/issues/57167 in that controllers are assuming they can hash or deepequal compare parent objects and spawned pods. Even if the new field had been correctly defaulted in v1.12, it appears that would have triggered restarts of the DaemonSet pods in that release.

liggitt on Jun 3, 2019