calico: Liveness probe failed: calico/node is not ready: Felix is not live: liveness probe reporting 503

Getting following error for calico-node pod

Liveness probe failed: calico/node is not ready: Felix is not live: liveness probe reporting 503

Steps to Reproduce (for bugs)

I am deploying calico CNI in 2 node kubernetes Kind(https://github.com/kubernetes-sigs/kind) cluster. I keep seeing following liveness probe failures with following logs

2021-05-12 08:53:53.213 [WARNING][53] felix/health.go 66: Report timed out name="int_dataplane" 2021-05-12 08:53:53.213 [WARNING][53] felix/health.go 184: Reporter is not live. name="int_dataplane" 2021-05-12 08:53:53.213 [WARNING][53] felix/health.go 55: Report timed out name="int_dataplane" 2021-05-12 08:53:53.213 [WARNING][53] felix/health.go 188: Reporter is not ready. name="int_dataplane" 2021-05-12 08:53:53.213 [INFO][53] felix/health.go 196: Overall health status changed newStatus=&health.HealthReport{Live:false, Ready:false} 2021-05-12 08:53:53.213 [WARNING][53] felix/health.go 165: Health: not live 2021-05-12 08:53:54.565 [WARNING][53] felix/health.go 66: Report timed out name="int_dataplane" 2021-05-12 08:53:54.565 [WARNING][53] felix/health.go 184: Reporter is not live. name="int_dataplane" 2021-05-12 08:53:54.565 [WARNING][53] felix/health.go 55: Report timed out name="int_dataplane" 2021-05-12 08:53:54.565 [WARNING][53] felix/health.go 188: Reporter is not ready. name="int_dataplane" 2021-05-12 08:53:54.565 [WARNING][53] felix/health.go 154: Health: not ready 2021-05-12 08:54:00.455 [INFO][56] monitor-addresses/startup.go 768: Using autodetected IPv4 address on interface eth0: 10.245.2.131/25 2021-05-12 08:54:03.223 [WARNING][53] felix/health.go 66: Report timed out name="int_dataplane" 2021-05-12 08:54:03.223 [WARNING][53] felix/health.go 184: Reporter is not live. name="int_dataplane" 2021-05-12 08:54:03.223 [WARNING][53] felix/health.go 55: Report timed out name="int_dataplane" 2021-05-12 08:54:03.223 [WARNING][53] felix/health.go 188: Reporter is not ready. name="int_dataplane" 2021-05-12 08:54:03.223 [WARNING][53] felix/health.go 165: Health: not live 2021-05-12 08:54:04.557 [WARNING][53] felix/health.go 66: Report timed out name="int_dataplane" 2021-05-12 08:54:04.558 [WARNING][53] felix/health.go 184: Reporter is not live. name="int_dataplane" 2021-05-12 08:54:04.558 [WARNING][53] felix/health.go 55: Report timed out name="int_dataplane" 2021-05-12 08:54:04.558 [WARNING][53] felix/health.go 188: Reporter is not ready. name="int_dataplane" 2021-05-12 08:54:04.558 [WARNING][53] felix/health.go 154: Health: not ready 2021-05-12 08:54:13.187 [WARNING][53] felix/health.go 66: Report timed out name="int_dataplane" 2021-05-12 08:54:13.187 [WARNING][53] felix/health.go 184: Reporter is not live. name="int_dataplane" 2021-05-12 08:54:13.187 [WARNING][53] felix/health.go 55: Report timed out name="int_dataplane" 2021-05-12 08:54:13.187 [WARNING][53] felix/health.go 188: Reporter is not ready. name="int_dataplane" 2021-05-12 08:54:13.187 [WARNING][53] felix/health.go 165: Health: not live 2021-05-12 08:54:14.537 [WARNING][53] felix/health.go 66: Report timed out name="int_dataplane" 2021-05-12 08:54:14.537 [WARNING][53] felix/health.go 184: Reporter is not live. name="int_dataplane" 2021-05-12 08:54:14.537 [WARNING][53] felix/health.go 55: Report timed out name="int_dataplane" 2021-05-12 08:54:14.537 [WARNING][53] felix/health.go 188: Reporter is not ready. name="int_dataplane" 2021-05-12 08:54:14.537 [WARNING][53] felix/health.go 154: Health: not ready

Your Environment

Can someone please help?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 26 (10 by maintainers)

Most upvoted comments

Seems that nobody cares about this issue…

@caseydavenport why you closed the ticket?

A month is not that long. Maybe he took the covid or is on vacation. Let’s try to stimulate him…

@pagarwal-tibco knock knock! dddd

The same problem on k8s node(Ubuntu 18.04.5 LTS/5.4.0-60-generic)

Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  11m (x8338 over 10d)  kubelet  (combined from similar events): Readiness probe failed: 2021-10-09 06:49:07.655 [INFO][27506] confd/health.go 180: Number of node(s) with BGP peering established = 76
calico/node is not ready: felix is not ready: readiness probe reporting 503
  Warning  Unhealthy  5m17s (x6281 over 20d)  kubelet  Liveness probe failed: calico/node is not ready: Felix is not live: liveness probe reporting 503

I have seen these symptoms in a system that was starved of CPU. It might be worth trying this on a machine with more CPU?

I upgrade calico version resolved my probles, see https://github.com/kubesphere/kubekey/issues/1282

I’ve been struggling with this issue for past few days and managed to fix this by editing a clusterrole resource. I have an RKE-based cluster (version 1.21.10), and I upgraded calico related images up to 3.21.5, after that the initial healthcheck issue had cropped up. Make sure you have the proper clusterrole manifest as following (copied from the original Calico website):

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: calico-node
rules:
  # The CNI plugin needs to get pods, nodes, and namespaces.
  - apiGroups: [""]
    resources:
      - pods
      - nodes
      - namespaces
    verbs:
      - get
  # EndpointSlices are used for Service-based network policy rule
  # enforcement.
  - apiGroups: ["discovery.k8s.io"]
    resources:
      - endpointslices
    verbs:
      - watch
      - list
  - apiGroups: [""]
    resources:
      - endpoints
      - services
    verbs:
      # Used to discover service IPs for advertisement.
      - watch
      - list
      # Used to discover Typhas.
      - get
  # Pod CIDR auto-detection on kubeadm needs access to config maps.
  - apiGroups: [""]
    resources:
      - configmaps
    verbs:
      - get
  - apiGroups: [""]
    resources:
      - nodes/status
    verbs:
      # Needed for clearing NodeNetworkUnavailable flag.
      - patch
      # Calico stores some configuration information in node annotations.
      - update
  # Watch for changes to Kubernetes NetworkPolicies.
  - apiGroups: ["networking.k8s.io"]
    resources:
      - networkpolicies
    verbs:
      - watch
      - list
  # Used by Calico for policy information.
  - apiGroups: [""]
    resources:
      - pods
      - namespaces
      - serviceaccounts
    verbs:
      - list
      - watch
  # The CNI plugin patches pods/status.
  - apiGroups: [""]
    resources:
      - pods/status
    verbs:
      - patch
  # Calico monitors various CRDs for config.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - globalfelixconfigs
      - felixconfigurations
      - bgppeers
      - globalbgpconfigs
      - bgpconfigurations
      - ippools
      - ipamblocks
      - globalnetworkpolicies
      - globalnetworksets
      - networkpolicies
      - networksets
      - clusterinformations
      - hostendpoints
      - blockaffinities
      - caliconodestatuses
    verbs:
      - get
      - list
      - watch
  # Calico must create and update some CRDs on startup.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - ippools
      - felixconfigurations
      - clusterinformations
    verbs:
      - create
      - update
  # Calico stores some configuration information on the node.
  - apiGroups: [""]
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
  # These permissions are required for Calico CNI to perform IPAM allocations.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - blockaffinities
      - ipamblocks
      - ipamhandles
    verbs:
      - get
      - list
      - create
      - update
      - delete
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - ipamconfigs
    verbs:
      - get
  # Block affinities must also be watchable by confd for route aggregation.
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - blockaffinities
    verbs:
      - watch

Hopefully it helps.

Sorry for late reply, I was away. I upgraded docker for mac to 3.6.0 and I confirm that it works now. So it seems that the issue was caused by docker for mac.

Thanks for all the help.