kubernetes: Even if a node disk I/O error occurs, the node status is Ready.
What happened:
The partition was changed to read-only due to an I/O error in the worker node’s disk. However, the state of the Node continued to be Ready, and as Pods were distributed to the Node, it started to wait indefinitely with Pending.
Also, even though Node2, the control plane, has been changed to Read-Only due to Disk I/O Error, the status is Ready.
Node 5 is the server where Disk I/O Error is currently occurring.
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 2d7h v1.20.6
node2 Ready control-plane,master 2d7h v1.20.6
node3 Ready control-plane,master 2d7h v1.20.6
node4 Ready <none> 2d7h v1.20.6
node5 Ready <none> 2d7h v1.20.6
node6 Ready <none> 2d7h v1.20.6
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-world-568d5cb84-7k8wx 1/1 Running 0 102m 10.233.108.5 node6 <none> <none>
hello-world-568d5cb84-f77nw 0/1 Terminating 0 96m <none> node5 <none> <none>
hello-world-568d5cb84-t8qm4 0/1 Terminating 0 96m <none> node5 <none> <none>
hello-world-845c956754-47vv7 1/1 Running 0 20m 10.233.105.4 node4 <none> <none>
hello-world-845c956754-j4ttf 0/1 Pending 0 20m <none> node5 <none> <none>
hello-world-845c956754-kdxlc 0/1 Pending 0 20m <none> node5 <none> <none>
hello-world-845c956754-rglq7 1/1 Running 0 20m 10.233.108.7 node6 <none> <none>
hello-world-845c956754-tnsb2 1/1 Running 0 20m 10.233.105.5 node4 <none> <none>
node2 mount information
root@node2:~# mount | more
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,noexec,relatime,size=1970892k,nr_inodes=492723,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=403056k,mode=755)
/dev/mapper/ubuntu--vg-ubuntu--lv on / type ext4 (ro,relatime)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
What you expected to happen:
If the partition becomes Read-Only due to Disk I/O Error, I think that the node should be Not Ready.
And Node2 control plane node, like Node5, is in a read-only state due to a disk I/O error. However, node2 has a status of Ready. I think this part should be removed from the service as Not Ready as it is Readonly
How to reproduce it (as minimally and precisely as possible):
After completing the node configuration, a disk I/O error occurred, and in this case, it is difficult to make it arbitrarily. If I can force the partition to be read-only without restarting, I think it will be possible to reproduce it.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): v1.20.6 - Cloud provider or hardware configuration: On-premise
- OS (e.g:
cat /etc/os-release
): Ubuntu 20.04.2 LTS - Kernel (e.g.
uname -a
): Linux node1 5.4.0-72-generic #80-Ubuntu SMP Mon Apr 12 17:35:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux - Install tools: kubespray
- Network plugin and version (if this is a network-related bug): calico
- Others:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (8 by maintainers)
This seems to be a feature request.
It will expand the implication of the node’s ready status.