kubernetes: Even if a node disk I/O error occurs, the node status is Ready.

What happened:

The partition was changed to read-only due to an I/O error in the worker node’s disk. However, the state of the Node continued to be Ready, and as Pods were distributed to the Node, it started to wait indefinitely with Pending.

Also, even though Node2, the control plane, has been changed to Read-Only due to Disk I/O Error, the status is Ready.

Node 5 is the server where Disk I/O Error is currently occurring.

NAME    STATUS   ROLES                  AGE    VERSION
node1   Ready    control-plane,master   2d7h   v1.20.6
node2   Ready    control-plane,master   2d7h   v1.20.6
node3   Ready    control-plane,master   2d7h   v1.20.6
node4   Ready    <none>                 2d7h   v1.20.6
node5   Ready    <none>                 2d7h   v1.20.6
node6   Ready    <none>                 2d7h   v1.20.6
NAME                           READY   STATUS        RESTARTS   AGE    IP             NODE    NOMINATED NODE   READINESS GATES
hello-world-568d5cb84-7k8wx    1/1     Running       0          102m   10.233.108.5   node6   <none>           <none>
hello-world-568d5cb84-f77nw    0/1     Terminating   0          96m    <none>         node5   <none>           <none>
hello-world-568d5cb84-t8qm4    0/1     Terminating   0          96m    <none>         node5   <none>           <none>
hello-world-845c956754-47vv7   1/1     Running       0          20m    10.233.105.4   node4   <none>           <none>
hello-world-845c956754-j4ttf   0/1     Pending       0          20m    <none>         node5   <none>           <none>
hello-world-845c956754-kdxlc   0/1     Pending       0          20m    <none>         node5   <none>           <none>
hello-world-845c956754-rglq7   1/1     Running       0          20m    10.233.108.7   node6   <none>           <none>
hello-world-845c956754-tnsb2   1/1     Running       0          20m    10.233.105.5   node4   <none>           <none>

node2 mount information

root@node2:~# mount | more
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,noexec,relatime,size=1970892k,nr_inodes=492723,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=403056k,mode=755)
/dev/mapper/ubuntu--vg-ubuntu--lv on / type ext4 (ro,relatime)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

What you expected to happen:

If the partition becomes Read-Only due to Disk I/O Error, I think that the node should be Not Ready.

And Node2 control plane node, like Node5, is in a read-only state due to a disk I/O error. However, node2 has a status of Ready. I think this part should be removed from the service as Not Ready as it is Readonly

How to reproduce it (as minimally and precisely as possible):

After completing the node configuration, a disk I/O error occurred, and in this case, it is difficult to make it arbitrarily. If I can force the partition to be read-only without restarting, I think it will be possible to reproduce it.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.20.6
  • Cloud provider or hardware configuration: On-premise
  • OS (e.g: cat /etc/os-release): Ubuntu 20.04.2 LTS
  • Kernel (e.g. uname -a): Linux node1 5.4.0-72-generic #80-Ubuntu SMP Mon Apr 12 17:35:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: kubespray
  • Network plugin and version (if this is a network-related bug): calico
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18 (8 by maintainers)

Most upvoted comments

This seems to be a feature request.

It will expand the implication of the node’s ready status.