cluster-api-provider-azure: could not get instance metadata on Windows node

/kind bug

[Before submitting an issue, have you checked the Troubleshooting Guide?]

What steps did you take and what happened: [A clear and concise description of what the bug is.]

What did you expect to happen:

I set up a capz cluster with Windows Server 2019 Datacenter node and also installed CSI driver on windows node, and CSI driver could not get instance metadata on Windows node

# kubectl get no -o wide
NAME                              STATUS   ROLES                  AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION     CONTAINER-RUNTIME
capz-d0un-gqnn8                   Ready    <none>                 23m   v1.22.1   10.1.0.6      <none>        Windows Server 2019 Datacenter   10.0.17763.2237    containerd://1.6.0-beta.0
I0228 11:31:21.570720    3008 utils.go:77] GRPC call: /csi.v1.Node/NodeGetInfo
I0228 11:31:21.570720    3008 utils.go:78] GRPC request: {}
W0228 11:31:42.573798    3008 nodeserver.go:337] get zone(capz-jn2u-8j6rb) failed with: Get "http://169.254.169.254/metadata/instance?api-version=2019-03-11&format=json": dial tcp 169.254.169.254:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

detailed logs from CI: es-sigs_azuredisk-csi-driver/1054/pull-kubernetes-e2e-capz-azure-disk-windows/1498155530961555456/build-log.txt

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • cluster-api-provider-azure version:
  • Kubernetes version: (use kubectl version): 1.22.1
  • OS (e.g. from /etc/os-release):

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 32 (26 by maintainers)

Most upvoted comments

we workaround this issue by https://github.com/kubernetes-sigs/azuredisk-csi-driver/pull/1200, if IMDS is not available, the driver would get instance type from node labels, so host process deployment is not mandatory in this case.

I0617 13:33:09.076571    5940 utils.go:77] GRPC call: /csi.v1.Node/NodeGetInfo
I0617 13:33:09.076571    5940 utils.go:78] GRPC request: {}
W0617 13:33:30.089992    5940 nodeserver.go:382] get instance type(capz-8ken-5z262) failed with: Get "http://169.254.169.254/metadata/instance?api-version=2021-10-01&format=json": dial tcp 169.254.169.254:80: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
W0617 13:33:30.092123    5940 nodeserver.go:385] fall back to get instance type from node labels
I0617 13:33:30.096506    5940 round_trippers.go:553] GET https://10.96.0.1:443/api/v1/nodes/capz-8ken-5z262 200 OK in 4 milliseconds
I0617 13:33:30.098487    5940 nodeserver.go:431] got a matching size in getMaxDataDiskCount, VM Size: STANDARD_D4S_V3, MaxDataDiskCount: 8

I confirmed that container in aks-engine clusters have access to IMDS. I also confirmed that containers in CAPZ clusters (running both as ContainerUser and ContainerAdministrator) do not. I’ll try and figure out why.

HostProcess containers and windows nodes in CAPZ cluster DO have access to IMDS so the issue appears to be in the CNI/calico config issue.

let’s keep open