kubernetes: Windows Container stuck in Initializing

What happened: This is an AKS (Azure) Cluster. A windows pod has 2 containers. The first container initializes a volume and completes successfully. The second container is a IIS Website which sits in “Initializing” and never starts. This happens roughly 4/10 times. There are no logs from the container. If we delete the pod, the ReplicaSet creates a new pod which also fails in this way roughly 4/10 times. If the container is still stuck in initializing we can delete the pod again, the third pod always works.

What you expected to happen: The second container should start or at least give logs.

How to reproduce it (as minimally and precisely as possible): This only happens in our development cluster. I cannot reproduce it in other clusters. To reproduce, we create a new pod.

Anything else we need to know?: There are plenty of resources. We added more nodes and everything is underutilized. We Stopped and then Started the cluster (https://docs.microsoft.com/en-us/azure/aks/start-stop-cluster) which also re-creates all of the nodes. We upgraded the Kubernetes and Nodes Versions to the latest.

This issue occurs with just a single pod, and also if I scale up the number of replicas. If I increase the replicas from 1 to 7, 3 of the pods will get stuck in “initializing” but the other 4 will start successfully.

Environment:

  • Kubernetes version (use kubectl version): v1.19.3
  • Cloud provider or hardware configuration: Azure AKS
  • OS (e.g: cat /etc/os-release): Windows 2019
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

The Pod id should be: exp-app-site-029836d6-7d50-eb11-a607-0004ffb07b92-7b87445fdgbmk