k8s-bigip-ctlr: CIS nodepoller doesn't work and stop do the arps process If there is one node in the cluster which the Vtep MAC cannot be obtained

Setup Details

CIS Version : 2.7.0 Build: f5networks/k8s-bigip-ctlr:latest BIGIP Version: BIG-IP 15.1.4 Build 0.0.47 Final AS3 Version: none Agent Mode: CCCL Orchestration: K8S Orchestration Version: kubernetes v1.21.5 Pool Mode: Cluster Additional Setup details: <Platform/CNI Plugins/ cluster nodes/ etc>

Platform : CentOS Linux release 8.4.2105 Kernel: 4.18.0-305.19.1.el8_4.x86_64 CNI Plugins: flannel

Description

Due to one node in the cluster cannot get vtepmac,CIS nodepoller doesn’t work and stop do the arps process.

Steps To Reproduce

  1. To reproduce the issue simulates the node loss vtepmac , we edit the node yaml file to remove the annotation flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"5a:de:e9:80:38:7e"}' which was automatically inserted by flannel. #kubectl edit node cluster1-w1 and remove the annotation flannel.alpha.coreos.com/backend-data: '{"VtepMAC":"5a:de:e9:80:38:7e"}' and save.
  2. And scale one deployment which watch by CIS , wait long enough to see if VE have refresh configuration
  3. View the CIS log The normal worker node’s CIDR is 10.42.0.0/24. The abnormal worker node’s CIDR is 10.42.1.0/24.
2022/01/07 01:25:00 [INFO] [INIT] Starting: Container Ingress Services - Version: 2.7.0, BuildInfo: azure-1697-0dd06d23f0761fd29b1f614a52ed4b3695653cdd 
2022/01/07 01:25:01 [INFO] ConfigWriter started: 0xc000369020 
2022/01/07 01:25:01 [INFO] Started config driver sub-process at pid: 17 
2022/01/07 01:25:01 [INFO] [INIT] Creating Agent for cccl 
2022/01/07 01:25:01 [INFO] [CCCL] Initializing CCCL Agent 
2022/01/07 01:25:01 [INFO] [CCCL] Removing Partition p1_AS3 
 
2022/01/07 01:25:02 [INFO] [CORE] NodePoller (0xc0002645a0) registering new listener: 0x17a6700 
2022/01/07 01:25:02 [INFO] [CORE] NodePoller (0xc0002645a0) registering new listener: 0x1757a40 
2022/01/07 01:25:02 [INFO] [CORE] NodePoller started: (0xc0002645a0) 
2022/01/07 01:25:02 [INFO] [CORE] Not watching Ingress resources. 
2022/01/07 01:25:02 [INFO] [CORE] Watching ConfigMap resources. 
2022/01/07 01:25:02 [INFO] [CORE] Handling ConfigMap resource events. 
2022/01/07 01:25:02 [INFO] [CORE] Not handling Ingress resource events. 
2022/01/07 01:25:02 [INFO] [CORE] Registered BigIP Metrics 
2022/01/07 01:25:03 [INFO] [2022-01-07 01:25:03,585 __main__ INFO] entering inotify loop to watch /tmp/k8s-bigip-ctlr.config334657854/config.json 
2022/01/07 01:25:05 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:25:06 [INFO] [2022-01-07 01:25:06,589 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.1.4 
2022/01/07 01:25:06 [INFO] [2022-01-07 01:25:06,731 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.1.5 
2022/01/07 01:25:07 [INFO] [2022-01-07 01:25:07,664 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.1.5 
2022/01/07 01:25:07 [INFO] [2022-01-07 01:25:07,737 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.1.4 
2022/01/07 01:29:06 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:29:06 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.4's node. 
2022/01/07 01:29:06 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:29:06 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.5's node. 
2022/01/07 01:29:06 [INFO] [2022-01-07 01:29:06,480 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:29:07 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:29:07 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.5's node. 
2022/01/07 01:29:07 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:29:07 [INFO] [2022-01-07 01:29:07,723 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1 
2022/01/07 01:29:07 [INFO] [2022-01-07 01:29:07,952 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,342 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.1.4%0 
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,420 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.1.5%0 
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,702 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.247 
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,767 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.246 
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,826 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.1.4 
2022/01/07 01:29:08 [INFO] [2022-01-07 01:29:08,894 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.1.5 
2022/01/07 01:29:57 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:29:57 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.6's node. 
2022/01/07 01:29:57 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:29:57 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.6's node. 
2022/01/07 01:29:57 [INFO] [2022-01-07 01:29:57,478 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:29:58 [INFO] [2022-01-07 01:29:58,489 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:29:59 [INFO] [2022-01-07 01:29:59,025 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.246%0 
2022/01/07 01:30:02 [INFO] [2022-01-07 01:30:02,954 f5_cccl.resource.resource INFO] Updating ApiFDBTunnel: /Common/flannel_vxlan 
2022/01/07 01:30:06 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:30:06 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.7's node. 
2022/01/07 01:30:06 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:30:06 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.7's node. 
2022/01/07 01:30:06 [INFO] [2022-01-07 01:30:06,774 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1 
2022/01/07 01:30:07 [INFO] [2022-01-07 01:30:07,722 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1 
2022/01/07 01:30:08 [INFO] [2022-01-07 01:30:08,130 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.247%0 
2022/01/07 01:31:36 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:31:37 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.7's node. 
2022/01/07 01:31:37 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:31:37 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.6's node. 
2022/01/07 01:31:37 [INFO] [2022-01-07 01:31:37,345 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1 
2022/01/07 01:31:38 [INFO] [2022-01-07 01:31:38,490 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1 
2022/01/07 01:31:39 [INFO] [2022-01-07 01:31:39,007 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.1.7%0 
2022/01/07 01:31:50 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:31:50 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.6's node. 
2022/01/07 01:31:50 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:31:50 [INFO] [2022-01-07 01:31:50,416 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:31:51 [INFO] [2022-01-07 01:31:51,401 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:31:51 [INFO] [2022-01-07 01:31:51,765 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.1.6%0 
2022/01/07 01:31:52 [INFO] [2022-01-07 01:31:52,003 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.249 
2022/01/07 01:31:52 [INFO] [2022-01-07 01:31:52,053 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.248 
2022/01/07 01:31:52 [INFO] [2022-01-07 01:31:52,105 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.0.247 
2022/01/07 01:31:52 [INFO] [2022-01-07 01:31:52,167 f5_cccl.resource.resource INFO] Deleting IcrArp: /Common/k8s-10.42.0.246 
2022/01/07 01:32:05 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:32:05 [INFO] [2022-01-07 01:32:05,723 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:32:07 [INFO] [2022-01-07 01:32:07,175 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.250 
2022/01/07 01:36:53 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:36:54 [INFO] [2022-01-07 01:36:54,201 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:36:55 [INFO] [2022-01-07 01:36:55,496 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.251 
2022/01/07 01:37:05 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:37:05 [INFO] [2022-01-07 01:37:05,406 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:37:06 [INFO] [2022-01-07 01:37:06,698 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.252 
2022/01/07 01:37:16 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:37:16 [INFO] [2022-01-07 01:37:16,493 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:37:17 [INFO] [2022-01-07 01:37:17,999 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.253 
2022/01/07 01:37:27 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:37:28 [INFO] [2022-01-07 01:37:28,080 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:37:29 [INFO] [2022-01-07 01:37:29,410 f5_cccl.resource.resource INFO] Creating ApiArp: /Common/k8s-10.42.0.254 
2022/01/07 01:38:35 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:38:35 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.8's node. 
2022/01/07 01:38:36 [INFO] [2022-01-07 01:38:36,166 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1 
2022/01/07 01:38:38 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:38:38 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.8's node. 
2022/01/07 01:38:39 [INFO] [2022-01-07 01:38:39,086 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1 
2022/01/07 01:38:42 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:38:42 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:38:42 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:38:42 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:38:43 [INFO] [2022-01-07 01:38:43,260 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1 
2022/01/07 01:38:44 [INFO] [2022-01-07 01:38:44,198 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_cafevs1 
2022/01/07 01:38:44 [INFO] [2022-01-07 01:38:44,605 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.248%0 
2022/01/07 01:40:24 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:40:24 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:40:24 [INFO] [2022-01-07 01:40:24,964 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:40:25 [INFO] [2022-01-07 01:40:25,728 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.254%0 
2022/01/07 01:40:35 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:40:35 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:40:35 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:40:35 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:40:35 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:40:36 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:40:36 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:40:36 [INFO] [2022-01-07 01:40:36,280 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:40:36 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:40:38 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:40:38 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:40:38 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:40:38 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:40:38 [INFO] [2022-01-07 01:40:38,897 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:40:39 [INFO] [2022-01-07 01:40:39,492 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.252%0 
2022/01/07 01:40:39 [INFO] [2022-01-07 01:40:39,577 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.253%0 
2022/01/07 01:40:40 [INFO] [2022-01-07 01:40:40,201 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:40:40 [INFO] [2022-01-07 01:40:40,580 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.251%0 
2022/01/07 01:40:44 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:40:45 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:40:45 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:40:45 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:40:45 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:40:45 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:40:45 [INFO] [2022-01-07 01:40:45,531 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:40:46 [INFO] [2022-01-07 01:40:46,813 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
2022/01/07 01:40:47 [INFO] [CCCL] Wrote 0 Virtual Server and 2 IApp configs 
2022/01/07 01:40:47 [INFO] [2022-01-07 01:40:47,417 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.249%0 
2022/01/07 01:40:47 [ERROR] [VxLAN] Vxlan manager could not get VtepMac for 10.42.1.10's node. 
2022/01/07 01:40:47 [INFO] [2022-01-07 01:40:47,539 f5_cccl.resource.resource INFO] Deleting IcrNode: /p1/10.42.0.250%0 
2022/01/07 01:40:47 [INFO] [2022-01-07 01:40:47,976 f5_cccl.resource.resource INFO] Updating ApiApplicationService: /p1/default_tea 
  1. View the pod IP
[root@cluster1-m1 1]# kubectl get pod  -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP           NODE          NOMINATED NODE   READINESS GATES
coffee-87b9987b4-lzch9   1/1     Running   0          3m58s   10.42.1.10   cluster1-w1   <none>           <none>
coffee-87b9987b4-nmsk2   1/1     Running   0          4m6s    10.42.1.8    cluster1-w1   <none>           <none>
coffee-87b9987b4-q778v   1/1     Running   0          4m2s    10.42.1.9    cluster1-w1   <none>           <none>
tea-67977d68b-4qbcz      1/1     Running   0          117s    10.42.0.6    cluster1-m1   <none>           <none>
tea-67977d68b-664f9      1/1     Running   0          116s    10.42.0.7    cluster1-m1   <none>           <none>
tea-67977d68b-8xtwl      1/1     Running   0          2m8s    10.42.0.5    cluster1-m1   <none>           <none>
tea-67977d68b-j2lb8      1/1     Running   0          114s    10.42.0.8    cluster1-m1   <none>           <none>
tea-67977d68b-rxwkl      1/1     Running   0          2m8s    10.42.0.3    cluster1-m1   <none>           <none>
tea-67977d68b-tsc6k      1/1     Running   0          2m8s    10.42.0.2    cluster1-m1   <none>           <none>
  1. View the VE ARP list and you see that the new pod ip did not update even the pod running in the normal and healthy worker node. image

Expected Result

CIS outputs the error log, but nodepoller and arp process still works.

Actual Result

CIS outputs the error log, but nodepoller and arp process does not works.

Diagnostic Information

The CIS yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: k8s-bigip-ctlr1
  name: cc-k8s-to-bigip1
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: k8s-bigip-ctlr1
  template:
    metadata:
      labels:
        app: k8s-bigip-ctlr1
      name: k8s-bigip-ctlr1
    spec:
      containers:
      - args:
        - --bigip-username=$(BIGIP_USERNAME)
        - --bigip-password=$(BIGIP_PASSWORD)
        - --manage-ingress=false
        - --bigip-partition=partition1
        - --bigip-url=https://10.1.20.252
        - --pool-member-type=cluster
        - --flannel-name=/Common/flannel_vxlan
        - --insecure=true
        - --agent=cccl
        command:
        - /app/bin/k8s-bigip-ctlr
        env:
        - name: BIGIP_USERNAME
          valueFrom:
            secretKeyRef:
              key: username
              name: bigip-login1
              optional: false
        - name: BIGIP_PASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: bigip-login1
              optional: false
        image: f5networks/k8s-bigip-ctlr:2.7.0
        imagePullPolicy: Always
        name: k8s-bigip-ctlr1
      serviceAccount: bigip-ctlr
      serviceAccountName: bigip-ctlr

Observations (if any)

It may be related to the flannel bug “Flannel Annotations “flannel.alpha.coreos.com” issue #1122”

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 15 (10 by maintainers)

Most upvoted comments

@myf5

I guess a quick hack/fix is to remove log.Errorf and return, use log.Infof ("[VxLAN] %v", err) https://github.com/F5Networks/k8s-bigip-ctlr/blob/master/pkg/vxlan/vxlanMgr.go#L229-L233

                var mac string                                                   
                mac, err = getVtepMac(pod, kubePods, kubeNodes)                  
                if nil != err {                                                  
                        log.Errorf("[VxLAN] %v", err)                            
                        return                                                   
                }