kubernetes: When I change hugepage size 2Mi to 1Gi,kubelet can't update node status
What happened:
- When I change default hugepage size 2Mi to 1Gi and reboot, kubelet can’t update node status
[root@worker-01 ~]# cat /etc/default/grub
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto default_hugepagesz=1GB rhgb quiet idle=halt biosdevname=0 net.ifnames=0 console=tty0 console=ttyS0,115200n8 noibrs"
GRUB_DISABLE_RECOVERY="true"
E0723 03:23:36.757554 2183 kubelet_node_status.go:385] Error updating node status, will retry: failed to patch status "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"MemoryPressure\"},{\"type\":\"DiskPressure\"},{\"type\":\"PIDPressure\"},{\"type\":\"Ready\"}],\"allocatable\":{\"devices.kubevirt.io/tun\":\"110\",\"devices.kubevirt.io/vhost-net\":\"110\",\"hugepages-1Gi\":\"16Gi\",\"memory\":\"0\"},\"capacity\":{\"devices.kubevirt.io/kvm\":\"110\",\"devices.kubevirt.io/tun\":\"110\",\"devices.kubevirt.io/vhost-net\":\"110\",\"hugepages-1Gi\":\"16Gi\"},\"conditions\":[{\"lastHeartbeatTime\":\"2019-07-23T03:23:36Z\",\"lastTransitionTime\":\"2019-07-23T03:23:36Z\",\"message\":\"kubelet has sufficient memory available\",\"reason\":\"KubeletHasSufficientMemory\",\"status\":\"False\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2019-07-23T03:23:36Z\",\"lastTransitionTime\":\"2019-07-23T03:23:36Z\",\"message\":\"kubelet has no disk pressure\",\"reason\":\"KubeletHasNoDiskPressure\",\"status\":\"False\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2019-07-23T03:23:36Z\",\"lastTransitionTime\":\"2019-07-23T03:23:36Z\",\"message\":\"kubelet has sufficient PID available\",\"reason\":\"KubeletHasSufficientPID\",\"status\":\"False\",\"type\":\"PIDPressure\"},{\"lastHeartbeatTime\":\"2019-07-23T03:23:36Z\",\"lastTransitionTime\":\"2019-07-23T03:23:36Z\",\"message\":\"kubelet is posting ready status\",\"reason\":\"KubeletReady\",\"status\":\"True\",\"type\":\"Ready\"}],\"nodeInfo\":{\"bootID\":\"35150190-4de9-46d8-b1f5-05dd0cface2b\"},\"volumesInUse\":null}}" for node "worker-01": Node "worker-01" is invalid: [status.capacity.hugepages-1Gi: Invalid value: resource.Quantity{i:resource.int64Amount{value:17179869184, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.pods: Invalid value: resource.Quantity{i:resource.int64Amount{value:250, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"250", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.devices.kubevirt.io/kvm: Invalid value: resource.Quantity{i:resource.int64Amount{value:110, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"110", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.capacity.cpu: Invalid value: resource.Quantity{i:resource.int64Amount{value:12, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"12", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.hugepages-2Mi: Invalid value: resource.Quantity{i:resource.int64Amount{value:16030629888, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"", Format:"BinarySI"}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.devices.kubevirt.io/tun: Invalid value: resource.Quantity{i:resource.int64Amount{value:110, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"110", Format:"DecimalSI"}: may not have pre-allocated hugepages for multiple page sizes]
What you expected to happen:
- kubelet can update node info and node status was ready
How to reproduce it (as minimally and precisely as possible):
- change the node condition validate
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:36:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration:
hyperkube
- OS (e.g:
cat /etc/os-release):
centos 7.6
- Kernel (e.g.
uname -a):
[root@worker-01 ~]# uname -a
Linux worker-01 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
kubevirt(It will import kvm tun device to kubelet,but i think it is not the reason)
- Network plugin and version (if this is a network-related bug):
canal
- Others:
[root@worker-01 ~]# cat /proc/meminfo
MemTotal: 24521676 kB
MemFree: 5402000 kB
MemAvailable: 6477760 kB
Buffers: 118828 kB
Cached: 1218632 kB
SwapCached: 0 kB
Active: 916956 kB
Inactive: 1043992 kB
Active(anon): 628436 kB
Inactive(anon): 1184 kB
Active(file): 288520 kB
Inactive(file): 1042808 kB
Unevictable: 13912 kB
Mlocked: 13912 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 992 kB
Writeback: 0 kB
AnonPages: 637432 kB
Mapped: 235740 kB
Shmem: 2052 kB
Slab: 120568 kB
SReclaimable: 61004 kB
SUnreclaim: 59564 kB
KernelStack: 19392 kB
PageTables: 17148 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3872228 kB
Committed_AS: 7299148 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 64008 kB
VmallocChunk: 34359569928 kB
HardwareCorrupted: 0 kB
AnonHugePages: 319488 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 16
HugePages_Free: 16
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
DirectMap4k: 145256 kB
DirectMap2M: 5097472 kB
DirectMap1G: 22020096 kB
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (8 by maintainers)
I am hitting the same, I used tuned to change the default size from 2Mi to 1Gi, i hit the same problem, after the reboot node is NotReady and it cannot rejoin , giving the error: Node “test-worker-0” is invalid: [status.capacity.hugepages-2Mi: Invalid value: resource.Quantity{i:resource.int64Amount{value:20971520, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:“20Mi”, Format:“BinarySI”}: may not have pre-allocated hugepages for multiple page sizes, status.allocatable.hugepages-2Mi: Invalid value: resource.Quantity{i:resource.int64Amount{value:20971520, scale:0}, My system just reports information for 1Gi, not 2Mi anymore. Removing the node and rebooting fixed it, the node rejoined again reporting the right capacity. But this is not a solution for us, changing sizes should be supported.