rancher: [BUG] - Abnormally High Load Average on Kubernetes 1.25.9

Rancher Server Setup

Rancher version: 2.7.3
Installation option (Docker install/Helm Chart): Helm Chart RKE1
Proxy/Cert Details:

helm get values rancher -n cattle-system
USER-SUPPLIED VALUES:
hostname: rancher.example.com.br
ingress:
tls:
source: secret
privateCA: true

Information about the Cluster

Kubernetes version: v1.25.9 for Custom Clusters and v1.24.10 for the Imported cluster
Cluster Type (Local/Downstream): Downstream
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): The problem happens in both Custom and Imported clusters. Hosts are created using templates in Vsphere.

User Information

What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom): Admin

Describe the bug Our Workers and Masters nodes are freezing sporadically throughout the week. The symptoms are the same for all our clusters. For some reason, when the hosts (workers and masters) have a request for resources like CPU, Memory or IOPS, the server freezes. It’s possible to access the server but the Load average is so high any kind of command take minutes to complete. When the server freezes, the only way to recover is to reset it in the vSphere (hard stop). The guest OS stops responding.

To Reproduce This issue occurs sporadically throughout the week, particularly when there is a high demand for resources.

Result The servers freeze under high resource demand, causing a significant delay in command execution.

Expected Result The servers should be able to handle high resource demand without freezing or causing significant delays in command execution.

Screenshots Here are some screenshots that can help with understanding the issue. The screenshots include information from the top and iotop commands captured on one host where the problem occurred.

Additional context This issue is happening in both our Custom and Imported clusters. The cluster configurations are provided above. Our KubeletConfiguration for all clusters is as follows:

#Arquivo gerado via Ansible.
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
#https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
#address is the IP address for the Kubelet to serve on (set to 0.0.0.0 for all interfaces). Default: "0.0.0.0"
address: "10.0.142.42"
#serializeImagePulls when enabled, tells the Kubelet to pull images one at a time.
serializeImagePulls: false
#runtimeRequestTimeout is the timeout for all runtime requests except long running requests - pull, logs, exec and attach. Default: "2m"
runtimeRequestTimeout: "30m"
#evictionHard is a map of signal names to quantities that defines hard eviction thresholds.
evictionHard:
    memory.available: "100Mi"
    nodefs.available: "3%"
    nodefs.inodesFree: "3%"
    imagefs.available: "8%"
#evictionMaxPodGracePeriod is the maximum allowed grace period (in seconds) to use when terminating pods in response to a soft eviction threshold being met.
evictionMaxPodGracePeriod: 60
#failSwapOn tells the Kubelet to fail to start if swap is enabled on the node. Default: true
failSwapOn: true
#containerLogMaxSize is a quantity defining the maximumApologies for the cut-off in the previous message. Here's the complete version:

OS Version and Docker Version

docker --version
Docker version 20.10.12, build e91ed57

uname -a
Linux paranagua 3.10.0-1160.76.1.el7.x86_64 #1 SMP Wed Aug 10 16:21:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Imported Cluster configuration

#Arquivo RKE de configuração do Rancher-Ouze-PRD RKE1.
#Conferir os nomes no https://rancher-ouze.example.com.br/dashboard/c/local/explorer/node. Nomes devem estar iguais dos hosts já existentes
nodes:
- address: anchieta
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - etcd
  hostname_override: ""
  user: rancher
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: aracruz
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - etcd
  hostname_override: ""
  user: rancher
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: piuma
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - etcd
  hostname_override: ""
  user: rancher
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: vilavelha
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: ""
  user: rancher
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: saomateus
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: ""
  user: rancher
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: marataizes
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: ""
  user: rancher
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: saovicente
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: ""
  user: rancher
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: praiadasdunas
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: ""
  user: rancher
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: praiadoespelho
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: ""
  user: rancher
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: praiadocalhau
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: ""
  user: rancher
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
- address: praiadacosta
  port: "22"
  internal_address: ""
  role:
  - worker
  hostname_override: ""
  user: rancher
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
services:
  etcd:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    external_urls: []
    ca_cert: ""
    cert: ""
    key: ""
    path: ""
    uid: 0
    gid: 0
    retention: ""
    creation: ""
  kube-api:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    service_cluster_ip_range: 10.43.0.0/16
    service_node_port_range: ""
    pod_security_policy: false
    always_pull_images: false
  kube-controller:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    cluster_cidr: 10.42.0.0/16
    service_cluster_ip_range: 10.43.0.0/16
  scheduler:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
  kubelet:
    extra_args:
      config: /var/lib/kubelet/kubelet-config.yml
      v: '1'
    extra_binds:
      - >-
        /var/lib/kubelet/kubelet-config.yml:/var/lib/kubelet/kubelet-config.yml
    fail_swap_on: false
    generate_serving_certificate: false
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    cluster_domain: cluster.local
    infra_container_image: ""
    cluster_dns_server: 10.43.0.10
  kubeproxy:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
network:
  plugin: canal
  options: {}
  mtu: 0
  node_selector: {}
  tolerations: []
authentication:
  strategy: x509
  sans: []
addons: ""
addons_include: []
ssh_key_path: ~/.ssh/id_rsa
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
  mode: rbac
  options: {}
kubernetes_version: "v1.24.10-rancher4-1"
private_registries: []
ingress:
  provider: ""
  options: {}
  node_selector: {}
  extra_args: {}
  dns_policy: ""
  extra_envs: []
  extra_volumes: []
  extra_volume_mounts: []
  http_port: 0
  https_port: 0
  network_mode: ""
  tolerations: []
  default_http_backend_priority_class_name: ""
  nginx_ingress_controller_priority_class_name: ""
cluster_name: ""
cloud_provider:
  name: ""
prefix_path: ""
win_prefix_path: ""
addon_job_timeout: 0
bastion_host:
  address: ""
  port: ""
  user: ""
  ssh_key: ""
  ssh_key_path: ""
  ssh_cert: ""
  ssh_cert_path: ""
  ignore_proxy_env_vars: false
restore:
  restore: false
  snapshot_name: ""
rotate_encryption_key: false`

Custom Cluster Configuration

answers: {}
docker_root_dir: /var/lib/docker
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: false
fleet_workspace_name: fleet-default
local_cluster_auth_endpoint:
  ca_certs: |-
    -----BEGIN CERTIFICATE-----
    -----END CERTIFICATE-----
    -----BEGIN CERTIFICATE-----
    -----END CERTIFICATE-----
  enabled: false
  fqdn: proxy-devops.example.com.br
name: devops
rancher_kubernetes_engine_config:
  addon_job_timeout: 45
  authentication:
    strategy: x509
  authorization: {}
  bastion_host:
    ignore_proxy_env_vars: false
    ssh_agent_auth: false
  cloud_provider: {}
  dns:
    linear_autoscaler_params:
      cores_per_replica: 128
      max: 0
      min: 1
      nodes_per_replica: 4
      prevent_single_point_failure: true
    node_selector: null
    nodelocal:
      node_selector: null
      update_strategy:
        rolling_update: {}
    options: null
    reversecidrs: null
    stubdomains: null
    tolerations: null
    update_strategy:
      rolling_update: {}
    upstreamnameservers: null
  enable_cri_dockerd: true
  ignore_docker_version: false
  ingress:
    default_backend: false
    default_ingress_class: true
    http_port: 0
    https_port: 0
    provider: nginx
  kubernetes_version: v1.25.9-rancher2-1
  monitoring:
    provider: metrics-server
    replicas: 1
  network:
    mtu: 0
    options:
      flannel_backend_type: vxlan
    plugin: canal
  restore:
    restore: false
  rotate_encryption_key: false
  services:
    etcd:
      backup_config:
        enabled: true
        interval_hours: 12
        retention: 6
        s3_backup_config:
          access_key: ACCESS_KEY
          bucket_name: backup
          endpoint: s3
          folder: cluster
          region: sa
        safe_timestamp: false
        timeout: 300
      creation: 12h
      extra_args:
        election-timeout: '5000'
        heartbeat-interval: '500'
      gid: 0
      retention: 72h
      snapshot: false
      uid: 0
    kube-api:
      always_pull_images: false
      pod_security_policy: false
      secrets_encryption_config:
        enabled: false
      service_node_port_range: 30000-32767
    kube-controller: {}
    kubelet:
      extra_args:
        config: /var/lib/kubelet/kubelet-config.yml
        v: '1'
      extra_binds:
        - >-
          /var/lib/kubelet/kubelet-config.yml:/var/lib/kubelet/kubelet-config.yml
      fail_swap_on: true
      generate_serving_certificate: false
    kubeproxy: {}
    scheduler: {}
  ssh_agent_auth: false
  upgrade_strategy:
    drain: false
    max_unavailable_controlplane: '2'
    max_unavailable_worker: 10%
    node_drain_input:
      delete_local_data: false
      force: false
      grace_period: -1
      ignore_daemon_sets: true
      timeout: 1800

top output of a degraded Master server:

top - 14:26:32 up 4 days, 4 min,  1 user,  load average: 71.90, 68.14, 56.26
Tasks: 266 total,   5 running, 261 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.5 us, 48.8 sy,  0.0 ni,  1.3 id, 40.7 wa,  0.0 hi,  4.7 si,  0.0 st
KiB Mem :  8008932 total,   126876 free,  7646316 used,   235740 buff/cache
KiB Swap:        0 total,        0 free,        0 used.    91368 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   45 root      20   0       0      0      0 R  68.7  0.0  16:05.64 kswapd0
 2015 root      20   0   11.0g 237468      0 S  26.3  3.0 145:49.37 etcd
 2904 root      20   0 2234468  58024      0 S  14.6  0.7  92:46.55 kubelet
23396 root      20   0  824904  16996      0 D  13.4  0.2   3:44.04 kube-controller
 4186 root      20   0 2115420  28424      0 S  12.9  0.4  47:25.29 calico-node
 4528 root      20   0 6608308   5.5g      0 S  12.9 71.8 160:26.56 agent
23395 root      20   0  760160  23908      0 S  12.1  0.3   3:01.25 kube-scheduler
 1997 root      20   0  754228  17648      0 R  10.2  0.2   3:35.35 kube-proxy
 1299 root      20   0 1573760  49636      0 S   6.1  0.6  46:00.94 dockerd
 3389 root      20   0  821892  38044      0 S   5.2  0.5   2:07.18 agent
 4187 root      20   0 1303600  19288      0 S   4.4  0.2   0:25.61 calico-node
    6 root      20   0       0      0      0 S   2.3  0.0   1:15.08 ksoftirqd/0
 4679 prometh+  20   0  713644  11760      0 S   2.3  0.1   0:31.56 pushprox-client
 1125 root      20   0 1118140  32900      0 S   2.1  0.4   6:43.15 containerd
 4120 root      20   0 1482884  18792      0 D   2.1  0.2   3:52.02 flanneld
 1106 prometh+  20   0  727004  12928      0 D   1.7  0.2   5:01.89 node_exporter
 4384 root      20   0  712432  10564      0 S   1.7  0.1   0:27.45 containerd-shim
27606 postfix   20   0   92120   1256    184 D   1.7  0.0   0:00.50 local
   14 root      20   0       0      0      0 S   1.5  0.0   0:42.46 ksoftirqd/1
 2019 root      20   0 1677868 685256      0 S   1.5  8.6 137:46.18 kube-apiserver
27607 root      20   0   92020   1208    192 D   1.3  0.0   0:00.50 pickup
  823 root      20   0  425060   1616     84 D   1.0  0.0   4:24.60 vmtoolsd
 2526 root      20   0  742088  18136      0 S   1.0  0.2  40:30.95 cri-dockerd
 4564 prometh+  20   0  713644  11780      0 S   1.0  0.1   0:29.65 pushprox-client
   19 root      20   0       0      0      0 S   0.8  0.0   0:42.79 ksoftirqd/2
 1119 root      20   0  574284  15424      0 S   0.8  0.2   0:30.85 tuned
 4824 prometh+  20   0  713644  10768      0 S   0.8  0.1   0:28.58 pushprox-client
 4840 prometh+  20   0  713644  10116      0 S   0.8  0.1   0:28.02 pushprox-client
25910 root      20   0  743384   6584      0 D   0.8  0.1   0:12.26 calico
27624 root      20   0   28268    340    108 D   0.8  0.0   0:00.04 iptables-legacy
27324 root      20   0  245160   1088     96 R   0.6  0.0   0:03.11 sssd_be
27621 root      20   0   28268    340    108 D   0.6  0.0   0:00.03 iptables-legacy
    1 root      20   0  191420   1860     84 D   0.4  0.0   3:06.07 systemd
 1123 root      20   0  251272  18084  16648 S   0.4  0.2   0:14.04 rsyslogd
 4806 root      20   0  712176  10768      0 S   0.4  0.1   0:29.73 containerd-shim
27539 root      20   0  172944    996    100 R   0.4  0.0   0:01.34 top
    9 root      20   0       0      0      0 S   0.2  0.0   1:51.52 rcu_sched
   24 root      20   0       0      0      0 S   0.2  0.0   0:19.55 ksoftirqd/3
  544 root      20   0       0      0      0 S   0.2  0.0   1:24.18 xfsaild/dm-0
  627 root      20   0   72976  32096  31740 D   0.2  0.4   0:06.22 systemd-journal
  818 dbus      20   0   60308    712      0 S   0.2  0.0   0:27.90 dbus-daemon
 1137 zabbix    20   0   31988    616    404 S   0.2  0.0   0:53.24 zabbix_agentd
 1832 root      20   0  712432   9212      0 S   0.2  0.1   0:27.42 containerd-shim
 3289 root      20   0    2500    108      0 S   0.2  0.0   0:07.62 tini
 3792 root      20   0    2500    108      0 S   0.2  0.0   0:07.00 tini
 4788 root      20   0  712432  10320      0 S   0.2  0.1   0:32.79 containerd-shim
27512 root      20   0       0      0      0 R   0.2  0.0   0:00.08 kworker/3:2
27622 root      20   0    1412    144     56 D   0.2  0.0   0:00.01 iptables
27623 root      20   0    1412    100     12 D   0.2  0.0   0:00.01 iptables
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.03 kthreadd
    4 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H

About this issue

Original URL
State: open
Created a year ago
Comments: 24 (6 by maintainers)

Most upvoted comments

@gmanera ,

To set CATTLE_REQUEST_CACHE_DISABLED on rancher:

kubectl -n cattle-system set env deploy/rancher  CATTLE_REQUEST_CACHE_DISABLED=true

On cattle-cluster-agent on the downstream clusters:

kubectl -n cattle-system set env deploy/cattle-cluster-agent  CATTLE_REQUEST_CACHE_DISABLED=true

If you have the Rancher CLI, do:

$ rancher server switch
$ for x in `rancher cluster ls --format '{{.Cluster.Name}}'`;
do
  kubectl set env deploy/cattle-cluster-agent -n cattle-system CATTLE_REQUEST_CACHE_DISABLED=true --context $x
done

(Based on this comment )

Note that setting the env var will trigger an update on the deployment, i.e. new pods will be created, so a potential interruption of the service if there is only one pod.

Also, this comment provides steps for evaluating whether the fix will work for you.

jiaqiluo on Jun 21, 2023

Please report that!

Yes I just did it 😃 https://github.com/rancher/rancher/issues/41906

lerminou on Jun 22, 2023

Hi @jiaqiluo ,

Thank you for your prompt response, my friend. You’re truly awesome for our community.

Here are our answers to your queries:

The problem is occurring across all four of our clusters. We have recently upgraded both the Rancher version and the K8S version. However, it is worth mentioning that we frequently perform upgrades. We initially started our clusters on Rancher 2.6.0, and we are currently on version 2.7.3. Whenever a new Rancher version is released, we upgrade it along with the K8S version. A few months ago, we began with K8S version 1.22, and now we are using version 1.24.10, with Longhorn 1.4.1 deployed across all clusters.
Yes, the problem is happening on both types of nodes, namely the Master and Workers. It’s a bit tricky to provide a precise answer, but we have observed frozen nodes with RabbitMQ running (alongside Longhorn managing the RabbitMQ volumes), resource-intensive pods consuming significant CPU/memory resources (without any associated volumes/IOPS), and masters experiencing issues when running Helm commands, such as upgrading the kube-prometheus-stack. This problem has occurred approximately 30 times across different nodes, clusters, and node types.
In accordance with your suggestion, I have updated all nodes across all four clusters. Yesterday, I performed the update to address the kernel version issue. Currently, the kernel version is as follows:

uname -r
3.10.0-1160.90.1.el7.x86_64

Prior to the upgrade, the kernel version was: 3.10.0-1160.76.1.el7.x86_64

Following your recommendations, I executed the provided command on all nodes within each cluster. I will continue monitoring the kswapd process, as you and @vinibodruch suggested.

echo 1 > /proc/sys/vm/drop_caches

Thank you again for your assistance and guidance. We truly appreciate your help.

gmanera on Jun 1, 2023