kubernetes: Windows Kubernetes worker node throws BSoD
Is this a BUG REPORT or FEATURE REQUEST?:
Uncomment only one, leave it on its own line:
/kind bug
/kind feature
What happened: BSoD occurs on Windows worker nodes more than two or three times a week, although this is not constant. The thing I checked with BSoD is UNEXPECTED_KERNEL_MODE_TRAP, and the related module name is NDIS.sys.
What you expected to happen: There is no kernel panic when I configure and run multiple Linux Kubernetes worker nodes.
How to reproduce it (as minimally and precisely as possible): We used KOPS to build a kernel node, kubenet for an existing kernel cluster, and Flannel Windows + L2Bridge configuration for a newly built Windows node.
Anything else we need to know?: The same problem occurred when using WinCNI, and the same problem occurs when using Flannel + L2Bridge, and it is expected that this problem will occur when an incorrect configuration request is requested to HNS.
Environment:
- Kubernetes version (use
kubectl version): Existing linux worker nodes are v1.9.4, and Windows worker nodes are v1.10.4 - Cloud provider or hardware configuration: AWS EC2
- OS (e.g. from /etc/os-release): Existing linux worker nodes are ‘Debian GNU/Linux 9.3 (stretch)’, and Windows worker nodes are ‘Windows Server 1803’.
- Kernel (e.g.
uname -a): Existing linux worker nodes are ‘Linux ip-x-y-z 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux’, and Windows worker nodes are ‘10.0.17134.137’. - Install tools: Existing linux worker nodes built with KOPS, and Windows nodes installed manually.
- Others: I attach the BSoD screenshot. After restarting the instance, I will collect the memory dump and try to analyze it with WinDBG.

About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 20 (9 by maintainers)
@rkttu The official KB for this issue was delayed internally, as a complete fix requires changes in other critical components (VFP) + another subsequent HNS patch. However, if you have a Microsoft support engineer & business justification, we should be able to give you a private hotfix for Windows Server 1803 earlier than October 16th .
This issue will also require a patch on Windows Server 2019 which we are generating. Windows Server 2019 contains only one mitigation patch for the most common scenario of this issue, but to remove it 100% in all cases you need another patch which will be out shortly after release.