kubernetes: Windows kubelet degraded after starting pod with too-low limits

Steps to reproduce:

  • Start a k8s cluster with one Windows node. I used the command below to bring up a cluster on GCE; I was able to reproduce the issue with both K8s v1.19.0-rc.4 and v1.17.9. NUM_NODES=2 NUM_WINDOWS_NODES=1 KUBE_GCE_ENABLE_IP_ALIASES=true KUBERNETES_NODE_PLATFORM=windows LOGGING_STACKDRIVER_RESOURCE_TYPES=new KUBE_UP_AUTOMATIC_CLEANUP=true WINDOWS_NODE_OS_DISTRIBUTION=win2019 ./cluster/kube-up.sh

  • Prepull an IIS/servercore container onto the Windows node (will take ~10 minutes); see yaml files below. kubectl create -f iis-pod-nolimit.yaml && sleep 10 && time kubectl wait --timeout=-1s --for=condition=Ready pod/iis-nolimit; kubectl delete -f iis-pod-nolimit.yaml

  • Optional, to make logs easier to parse: log into the node, stop the services (GCE-specific), clear the log, restart the services: Stop-Service kube-proxy; Stop-Service kubelet; Start-Sleep 10; rm C:\etc\kubernetes\logs\kubelet.log; rm C:\etc\kubernetes\logs\kube-proxy.log; Start-Service kubelet; Start-Service kube-proxy

  • Start a new pod and observe how long it typically takes to be ready: about ten seconds

kubectl create -f iis-pod-nolimit.yaml && sleep 5 && \
  time kubectl wait --timeout=-1s --for=condition=Ready pod/iis-nolimit; \
  kubectl get pods && kubectl delete -f iis-pod-nolimit.yaml

pod/iis-nolimit created
pod/iis-nolimit condition met

real    0m7.316s
user    0m0.229s
sys     0m0.049s
NAME          READY   STATUS    RESTARTS   AGE
iis-nolimit   1/1     Running   0          13s
pod "iis-nolimit" deleted
  • Now start a pod with a tiny resource limit set, e.g. memory: "10Mi". The pod will never become Ready, eventually failing with Error: context deadline exceeded (which is not helpful in describing what went wrong, but that’s not the main issue here):
kubectl create -f iis-pod-tinylimit.yaml && sleep 5 && \
  time kubectl wait --timeout=180s --for=condition=Ready pod/iis-tinylimit; \
  kubectl describe pod iis-tinylimit | tail

Events:
  Type     Reason     Age    From                                         Message
  ----     ------     ----   ----                                         -------
  Normal   Scheduled  3m39s  default-scheduler                            Successfully assigned default/iis-tinylimit to kubernetes-windows-node-group-rn4x
  Normal   Pulling    3m35s  kubelet, kubernetes-windows-node-group-rn4x  Pulling image "mcr.microsoft.com/windows/servercore/iis"
  Normal   Pulled     3m35s  kubelet, kubernetes-windows-node-group-rn4x  Successfully pulled image "mcr.microsoft.com/windows/servercore/iis"
  Normal   Created    3m35s  kubelet, kubernetes-windows-node-group-rn4x  Created container iis-server
  Warning  Failed     95s    kubelet, kubernetes-windows-node-group-rn4x  Error: context deadline exceeded
  Warning  FailedSync  14s (x4 over 43s)  kubelet, kubernetes-windows-node-group-rn4x  error determining status: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  • Now observe the Windows node status: after about 5 minutes, it will start flapping between Ready and NodeNotReady. The NodeNotReady status may be accompanied by a PLEG is not healthy message.
kubectl describe node kubernetes-windows-node-group-rn4x

…
  Ready                False   Thu, 06 Aug 2020 16:28:05 -0700   Thu, 06 Aug 2020 16:27:45 -0700   KubeletNotReady              PLEG is not healthy: pleg was last seen active 3m23.6880078s ago; threshold is 3m0s
…
  Normal  NodeNotReady             2m55s                kubelet, kubernetes-windows-node-group-rn4x     Node kubernetes-windows-node-group-rn4x status is now: NodeNotReady
  Normal  NodeReady                 115s    kubelet, kubernetes-windows-node-group-rn4x     Node kubernetes-windows-node-group-rn4x status is now: NodeReady
  • Fetch the kubelet log - see attached after-tinylimit-pod-kubelet.log. GCE-specific command: gcloud beta compute diagnose export-logs kubernetes-windows-node-group-rn4x --zone us-central1-b

  • Now try to create a pod with no limit again: previously it took 10 seconds, now it takes 4-5 minutes. This is the main issue: the failing pod with a tiny limit is somehow degrading the kubelet’s performance and delaying other pods that are scheduled onto the node.

kubectl create -f iis-pod-nolimit.yaml && sleep 5 && \
  time kubectl wait --timeout=-1s --for=condition=Ready pod/iis-nolimit; \
  kubectl get pods && kubectl delete -f iis-pod-nolimit.yaml

pod/iis-nolimit created
pod/iis-nolimit condition met

real    3m12.590s
user    0m0.250s
sys     0m0.040s
NAME            READY   STATUS              RESTARTS   AGE
iis-nolimit     1/1     Running             0          3m18s
iis-tinylimit   0/1     ContainerCreating   0          10m
pod "iis-nolimit" deleted
  • Fetch the kubelet log - see attached after-slow-pod-kubelet.log. GCE-specific command: gcloud beta compute diagnose export-logs kubernetes-windows-node-group-rn4x --zone us-central1-b

Environment:

  • Kubernetes version (use kubectl version): v1.19.0-rc.4, v1.17.9
  • Cloud provider or hardware configuration: GCE

Workload files: iis-pod-nolimit.yaml

apiVersion: v1
kind: Pod
metadata:
  name: iis-nolimit
  labels:
    app: iis-nolimit
spec:
  nodeSelector:
    kubernetes.io/os: windows
  tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/os
    operator: Equal
    value: win1809
  containers:
  - name: iis-server
    image: mcr.microsoft.com/windows/servercore/iis
    ports:
    - containerPort: 80

iis-pod-tinylimit.yaml

apiVersion: v1
kind: Pod
metadata:
  name: iis-tinylimit
  labels:
    app: iis-tinylimit
spec:
  nodeSelector:
    kubernetes.io/os: windows
  tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/os
    operator: Equal
    value: win1809
  containers:
  - name: iis-server
    image: mcr.microsoft.com/windows/servercore/iis
    ports:
    - containerPort: 80
    resources:
      limits:
        memory: "10Mi"
        cpu: "10m"

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 16 (11 by maintainers)

Most upvoted comments

@pjh I’ll try this out in the next day or two