flannel: Pods not getting deleted with error: failed to parse netconf: unexpected end of JSON input

Pods not getting deleted with error: “Unknown desc = failed to destroy network for sandbox "ff252b2872f45846e140c96bb096d5a23100dcd13bd25cecbf858655b336093e": plugin type="flannel" failed (delete): failed to parse netconf: unexpected end of JSON input”

Expected Behavior

Expected all pods to get deleted after a reboot of all k8s nodes.

Current Behavior

Pods not getting deleted with error: failed to parse netconf: unexpected end of JSON input

root@k8s-control-983-1665562565:~# kubectl get sc,sts,pvc,pods -owide -n vsan-stretch-4059 
NAME                                           PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
storageclass.storage.k8s.io/nginx-sc-default   csi.vsphere.vmware.com   Delete          Immediate           false                  15m

NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE   VOLUMEMODE
persistentvolumeclaim/pvc-5h64x   Bound    pvc-0676f481-57e4-49d1-98d0-af84efe81117   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-77ctb   Bound    pvc-d20aeb95-a2af-4a0b-a5ec-7a6858f804f2   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-7nw8m   Bound    pvc-6a768f2a-97cb-4634-a13f-746905a855da   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-92w56   Bound    pvc-6b3bb9a4-a544-43f5-ab91-bcdf97da07c9   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-9fzx5   Bound    pvc-8a6d736c-bece-459a-9a19-875cb147193c   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-bjmcs   Bound    pvc-84a83e16-9673-406b-8511-5f5a39a49fd3   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-bnnv8   Bound    pvc-f8bb9b02-209d-4f4d-808a-60d59ac21f6d   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-dt6n7   Bound    pvc-cdd0b517-8705-4312-9d99-9d72b6b68641   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-h6rhn   Bound    pvc-68b84fc6-dd2f-4d68-8a4b-64f3fd697dee   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-kb6zl   Bound    pvc-64087b99-9691-4303-af4f-815f55580abc   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-kpvjk   Bound    pvc-f1cdd50d-f8f6-4d42-9e50-eaebc5774fc7   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-mwm5g   Bound    pvc-f1d6092c-2f9d-4dc8-8c8e-06a71a773307   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-n6k5p   Bound    pvc-5de5f4bf-9f6e-453d-8a3c-d060c35e1813   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-ndw7r   Bound    pvc-4e95749b-1954-4ee9-abf2-7ac0c62b5bb4   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-np82z   Bound    pvc-ac4c75be-c76c-4d8c-a9f0-dc80787814d5   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-ns9vj   Bound    pvc-68b99f24-1730-4cd5-9d02-e655cd6e25a3   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-pd5t7   Bound    pvc-2d47bb98-588c-4636-9252-f101ae03faa1   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-pmxxz   Bound    pvc-5b20601f-33ab-4a56-92df-bf721fc0df11   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-pzc59   Bound    pvc-029c123f-5c67-42f1-8d28-cd2d30207d5a   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-qz5w2   Bound    pvc-4a4fb493-60e5-4e51-92af-c9468a71687f   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-rmmpf   Bound    pvc-72374230-56f8-4286-8f3f-f720762f907c   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-rrqcw   Bound    pvc-186cbe1c-3db1-4285-9b83-c5c1b058e6eb   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-rt9mk   Bound    pvc-ae297376-3362-41d5-8be6-d1ed33a5caf7   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-skd2f   Bound    pvc-9335f2a0-f6d0-4092-92e0-824261cf6a9f   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-slhbz   Bound    pvc-5a005474-1fb0-46dc-b4fe-10e83d61ede8   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-sn5xt   Bound    pvc-0108a5b6-dcaa-42d5-b788-1681a8d361c9   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-vlt9z   Bound    pvc-807c0481-a4b7-4764-97e2-82298925f55d   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-vxkvk   Bound    pvc-ed7ae8d6-44b5-4b08-a1de-612dc8ec7513   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-xd6pz   Bound    pvc-6493b9ca-d0e9-4bca-b77d-5245c89683a8   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-xpkbc   Bound    pvc-ac077678-e7c4-40a7-a92d-c1cb5d5580aa   2Gi        RWO            nginx-sc-default   15m   Filesystem

NAME                   READY   STATUS        RESTARTS   AGE   IP       NODE                      NOMINATED NODE   READINESS GATES
pod/pvc-tester-gk554   0/1     Terminating   0          10m   <none>   k8s-node-387-1665562607   <none>           <none>
pod/pvc-tester-mqwqq   0/1     Terminating   0          10m   <none>   k8s-node-286-1665562641   <none>           <none>

root@k8s-control-983-1665562565:~# kubectl describe pod -n vsan-stretch-4059 
Name:                      pvc-tester-gk554
Namespace:                 vsan-stretch-4059
Priority:                  0
Node:                      k8s-node-387-1665562607/10.180.206.154
Start Time:                Wed, 19 Oct 2022 13:04:00 +0000
Labels:                    <none>
Annotations:               <none>
Status:                    Terminating (lasts 6m25s)
Termination Grace Period:  30s
IP:                        
IPs:                       <none>
Containers:
  write-pod:
    Container ID:  containerd://4c81107fb578b79e00869cf962ecb0cca79dae6198774d594ecda9ba04280293
    Image:         harbor-repo.vmware.com/csi_ci/busybox:1.35
    Image ID:      harbor-repo.vmware.com/csi_ci/busybox@sha256:505e5e20edbb5f2ac0abe3622358daf2f4a4c818eea0498445b7248e39db6728
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      /bin/df -T /mnt/volume1 | /bin/awk 'FNR == 2 {print $2}' > /mnt/volume1/fstype && while true ; do sleep 2 ; done
    State:          Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Wed, 19 Oct 2022 13:04:08 +0000
      Finished:     Wed, 19 Oct 2022 13:06:30 +0000
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt/volume1 from volume1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfgg7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  volume1:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc-sn5xt
    ReadOnly:   false
  kube-api-access-gfgg7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   Scheduled               7m16s                default-scheduler        Successfully assigned vsan-stretch-4059/pvc-tester-gk554 to k8s-node-387-1665562607
  Normal   SuccessfulAttachVolume  7m14s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-0108a5b6-dcaa-42d5-b788-1681a8d361c9"
  Normal   Pulled                  7m8s                 kubelet                  Container image "harbor-repo.vmware.com/csi_ci/busybox:1.35" already present on machine
  Normal   Created                 7m8s                 kubelet                  Created container write-pod
  Normal   Started                 7m8s                 kubelet                  Started container write-pod
  Normal   Killing                 6m55s                kubelet                  Stopping container write-pod
  Warning  FailedKillPod           2s (x22 over 4m30s)  kubelet                  error killing pod: failed to "KillPodSandbox" for "0ba0e400-a981-4182-9a41-854260bcfa7a" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"8c760daa9dcba8c41858b14ba2bf4ae3ea47186dc7671b962465075c5fea92ec\": plugin type=\"flannel\" failed (delete): failed to parse netconf: unexpected end of JSON input"

Steps to Reproduce (for bugs)

  1. Create 30 PVCs and wait for its binding of each pvc with a PV.
  2. Create a pod with each PVC created in step 1.
  3. Delete all pods and reboot all k8s worker nodes simultaneously.
  4. Once the k8s worker nodes are up and Running, all the pods should be deleted.

Context

Disaster recovery scenarios with k8s 1.24 in containerd environment is failing with this issue. This was working fine in k8s 1.23 with dockershim.

##Your Environment Flannel version: v0.18.1 (https://github.com/flannel-io/flannel/blob/v0.18.1/Documentation/kube-flannel.yml) Etcd version: etcd:3.5.3-0 Kubernetes version (if used): 1.24 Operating System and version: linux Link to your project (optional): https://github.com/kubernetes-sigs/vsphere-csi-driver

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 15 (4 by maintainers)

Most upvoted comments

Hello @rbrtbnfgl 👋 We’ve found 0 byte files in /var/lib/cni/flannel/. When we deleted them, things went back ok. Why was there those empty files is still an open question! Since we deleted those files we didn’t yet face it again.

The issue is actually in the flannel CNI binary and should be fixed in release 1.1.2: https://github.com/flannel-io/cni-plugin/releases/tag/v1.1.2

Could you check which version is used?