flannel: Pods not getting deleted with error: failed to parse netconf: unexpected end of JSON input
Pods not getting deleted with error: “Unknown desc = failed to destroy network for sandbox "ff252b2872f45846e140c96bb096d5a23100dcd13bd25cecbf858655b336093e": plugin type="flannel" failed (delete): failed to parse netconf: unexpected end of JSON input”
Expected Behavior
Expected all pods to get deleted after a reboot of all k8s nodes.
Current Behavior
Pods not getting deleted with error: failed to parse netconf: unexpected end of JSON input
root@k8s-control-983-1665562565:~# kubectl get sc,sts,pvc,pods -owide -n vsan-stretch-4059
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/nginx-sc-default csi.vsphere.vmware.com Delete Immediate false 15m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE
persistentvolumeclaim/pvc-5h64x Bound pvc-0676f481-57e4-49d1-98d0-af84efe81117 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-77ctb Bound pvc-d20aeb95-a2af-4a0b-a5ec-7a6858f804f2 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-7nw8m Bound pvc-6a768f2a-97cb-4634-a13f-746905a855da 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-92w56 Bound pvc-6b3bb9a4-a544-43f5-ab91-bcdf97da07c9 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-9fzx5 Bound pvc-8a6d736c-bece-459a-9a19-875cb147193c 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-bjmcs Bound pvc-84a83e16-9673-406b-8511-5f5a39a49fd3 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-bnnv8 Bound pvc-f8bb9b02-209d-4f4d-808a-60d59ac21f6d 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-dt6n7 Bound pvc-cdd0b517-8705-4312-9d99-9d72b6b68641 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-h6rhn Bound pvc-68b84fc6-dd2f-4d68-8a4b-64f3fd697dee 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-kb6zl Bound pvc-64087b99-9691-4303-af4f-815f55580abc 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-kpvjk Bound pvc-f1cdd50d-f8f6-4d42-9e50-eaebc5774fc7 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-mwm5g Bound pvc-f1d6092c-2f9d-4dc8-8c8e-06a71a773307 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-n6k5p Bound pvc-5de5f4bf-9f6e-453d-8a3c-d060c35e1813 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-ndw7r Bound pvc-4e95749b-1954-4ee9-abf2-7ac0c62b5bb4 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-np82z Bound pvc-ac4c75be-c76c-4d8c-a9f0-dc80787814d5 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-ns9vj Bound pvc-68b99f24-1730-4cd5-9d02-e655cd6e25a3 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-pd5t7 Bound pvc-2d47bb98-588c-4636-9252-f101ae03faa1 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-pmxxz Bound pvc-5b20601f-33ab-4a56-92df-bf721fc0df11 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-pzc59 Bound pvc-029c123f-5c67-42f1-8d28-cd2d30207d5a 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-qz5w2 Bound pvc-4a4fb493-60e5-4e51-92af-c9468a71687f 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-rmmpf Bound pvc-72374230-56f8-4286-8f3f-f720762f907c 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-rrqcw Bound pvc-186cbe1c-3db1-4285-9b83-c5c1b058e6eb 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-rt9mk Bound pvc-ae297376-3362-41d5-8be6-d1ed33a5caf7 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-skd2f Bound pvc-9335f2a0-f6d0-4092-92e0-824261cf6a9f 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-slhbz Bound pvc-5a005474-1fb0-46dc-b4fe-10e83d61ede8 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-sn5xt Bound pvc-0108a5b6-dcaa-42d5-b788-1681a8d361c9 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-vlt9z Bound pvc-807c0481-a4b7-4764-97e2-82298925f55d 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-vxkvk Bound pvc-ed7ae8d6-44b5-4b08-a1de-612dc8ec7513 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-xd6pz Bound pvc-6493b9ca-d0e9-4bca-b77d-5245c89683a8 2Gi RWO nginx-sc-default 15m Filesystem
persistentvolumeclaim/pvc-xpkbc Bound pvc-ac077678-e7c4-40a7-a92d-c1cb5d5580aa 2Gi RWO nginx-sc-default 15m Filesystem
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/pvc-tester-gk554 0/1 Terminating 0 10m <none> k8s-node-387-1665562607 <none> <none>
pod/pvc-tester-mqwqq 0/1 Terminating 0 10m <none> k8s-node-286-1665562641 <none> <none>
root@k8s-control-983-1665562565:~# kubectl describe pod -n vsan-stretch-4059
Name: pvc-tester-gk554
Namespace: vsan-stretch-4059
Priority: 0
Node: k8s-node-387-1665562607/10.180.206.154
Start Time: Wed, 19 Oct 2022 13:04:00 +0000
Labels: <none>
Annotations: <none>
Status: Terminating (lasts 6m25s)
Termination Grace Period: 30s
IP:
IPs: <none>
Containers:
write-pod:
Container ID: containerd://4c81107fb578b79e00869cf962ecb0cca79dae6198774d594ecda9ba04280293
Image: harbor-repo.vmware.com/csi_ci/busybox:1.35
Image ID: harbor-repo.vmware.com/csi_ci/busybox@sha256:505e5e20edbb5f2ac0abe3622358daf2f4a4c818eea0498445b7248e39db6728
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
/bin/df -T /mnt/volume1 | /bin/awk 'FNR == 2 {print $2}' > /mnt/volume1/fstype && while true ; do sleep 2 ; done
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Wed, 19 Oct 2022 13:04:08 +0000
Finished: Wed, 19 Oct 2022 13:06:30 +0000
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/mnt/volume1 from volume1 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfgg7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
volume1:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pvc-sn5xt
ReadOnly: false
kube-api-access-gfgg7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m16s default-scheduler Successfully assigned vsan-stretch-4059/pvc-tester-gk554 to k8s-node-387-1665562607
Normal SuccessfulAttachVolume 7m14s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-0108a5b6-dcaa-42d5-b788-1681a8d361c9"
Normal Pulled 7m8s kubelet Container image "harbor-repo.vmware.com/csi_ci/busybox:1.35" already present on machine
Normal Created 7m8s kubelet Created container write-pod
Normal Started 7m8s kubelet Started container write-pod
Normal Killing 6m55s kubelet Stopping container write-pod
Warning FailedKillPod 2s (x22 over 4m30s) kubelet error killing pod: failed to "KillPodSandbox" for "0ba0e400-a981-4182-9a41-854260bcfa7a" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"8c760daa9dcba8c41858b14ba2bf4ae3ea47186dc7671b962465075c5fea92ec\": plugin type=\"flannel\" failed (delete): failed to parse netconf: unexpected end of JSON input"
Steps to Reproduce (for bugs)
- Create 30 PVCs and wait for its binding of each pvc with a PV.
- Create a pod with each PVC created in step 1.
- Delete all pods and reboot all k8s worker nodes simultaneously.
- Once the k8s worker nodes are up and Running, all the pods should be deleted.
Context
Disaster recovery scenarios with k8s 1.24 in containerd environment is failing with this issue. This was working fine in k8s 1.23 with dockershim.
##Your Environment Flannel version: v0.18.1 (https://github.com/flannel-io/flannel/blob/v0.18.1/Documentation/kube-flannel.yml) Etcd version: etcd:3.5.3-0 Kubernetes version (if used): 1.24 Operating System and version: linux Link to your project (optional): https://github.com/kubernetes-sigs/vsphere-csi-driver
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 15 (4 by maintainers)
Hello @rbrtbnfgl 👋 We’ve found 0 byte files in
/var/lib/cni/flannel/
. When we deleted them, things went back ok. Why was there those empty files is still an open question! Since we deleted those files we didn’t yet face it again.The issue is actually in the flannel CNI binary and should be fixed in release 1.1.2: https://github.com/flannel-io/cni-plugin/releases/tag/v1.1.2
Could you check which version is used?