kops: arp_cache: neighbor table overflow!

What kops version are you running? The command kops version, will display this information. 1.8
What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. 1.8.6
What cloud provider are you using? aws

I’m having a log of arp_cache table overflow in my production cluster, reading this blog post about large clusters: https://blog.openai.com/scaling-kubernetes-to-2500-nodes/ they say that the solution is increasing the maximum size of the arp cache table, can I configure sysctl options:

net.ipv4.neigh.default.gc_thresh1 net.ipv4.neigh.default.gc_thresh2 net.ipv4.neigh.default.gc_thresh3

using kops?

thanks!

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 19 (16 by maintainers)

Commits related to this issue

[Calico] Activate node controller in calico-kube-controllers and add CALICO_K8S_NODE_REF in calico-node, this commit fixes #3224 and #4533 — committed to felipejfc/kops by felipejfc 6 years ago
[Calico] Activate node controller in calico-kube-controllers and add CALICO_K8S_NODE_REF in calico-node, this commit fixes #3224 and #4533 — committed to felipejfc/kops by felipejfc 6 years ago
[Calico] Activate node controller in calico-kube-controllers and add CALICO_K8S_NODE_REF in calico-node, this commit fixes #3224 and #4533 — committed to felipejfc/kops by felipejfc 6 years ago
[Calico] Activate node controller in calico-kube-controllers and add CALICO_K8S_NODE_REF in calico-node, this commit fixes #3224 and #4533 — committed to vendrov/kops by felipejfc 6 years ago
[Calico] Activate node controller in calico-kube-controllers and add CALICO_K8S_NODE_REF in calico-node, this commit fixes #3224 and #4533 — committed to rdrgmnzs/kops by felipejfc 6 years ago

Most upvoted comments

guess every node will also have ARP entry for each of the pods running on other nodes as well, right? at least the ones the communicate with?

No, it shouldn’t have one for every pod because the nodes themselves are the next hops for traffic, not individual pod IPs. Instead, you’ll get an ARP entry for each node in the cluster. So, a given node’s ARP cache should roughly be num_pods_on_that_node + num_nodes_in_cluster.

caseydavenport on Mar 2, 2018