kubernetes: Kubelet stops reporting node status
Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):
No
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):
healthy node notready controller manager kubelet node status
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT
Kubernetes version (use kubectl version
):
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.4", GitCommit:"7243c69eb523aa4377bce883e7c0dd76b84709a1", GitTreeState:"clean", BuildDate:"2017-03-08T02:50:34Z", GoVersion:"go1.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.4+coreos.0", GitCommit:"97c11b097b1a2b194f1eddca8ce5468fcc83331c", GitTreeState:"clean", BuildDate:"2017-03-08T23:54:21Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- Cloud provider or hardware configuration: Azure Standard_A1
- OS (e.g. from /etc/os-release): NAME=“Container Linux by CoreOS” ID=coreos VERSION=1298.6.0 VERSION_ID=1298.6.0 BUILD_ID=2017-03-14-2119 PRETTY_NAME=“Container Linux by CoreOS 1298.6.0 (Ladybug)” ANSI_COLOR=“38;5;75” HOME_URL=“https://coreos.com/” BUG_REPORT_URL=“https://github.com/coreos/bugs/issues”
- Kernel (e.g.
uname -a
): Linux master-0-vm 4.9.9-coreos-r1 #1 SMP Tue Mar 14 21:09:42 UTC 2017 x86_64 Intel® Xeon® CPU E5-2673 v3 @ 2.40GHz GenuineIntel GNU/Linux - Install tools: https://github.com/edevil/kubernetes-deployment
- Others:
What happened:
Nodes are marked as not healthy incorrectly.
What you expected to happen:
Nodes should not be marked as not healthy.
How to reproduce it (as minimally and precisely as possible):
I just setup a cluster using the aforementioned method and wait.
Anything else we need to know:
Controller manager log on -v=4
I0322 02:35:39.439687 1 nodecontroller.go:713] Node node-1-vm ReadyCondition updated. Updating timestamp.
I0322 02:35:39.439935 1 nodecontroller.go:713] Node node-2-vm ReadyCondition updated. Updating timestamp.
I0322 02:35:39.440085 1 nodecontroller.go:713] Node node-4-vm ReadyCondition updated. Updating timestamp.
I0322 02:35:39.440167 1 nodecontroller.go:713] Node master-1-vm ReadyCondition updated. Updating timestamp.
I0322 02:35:39.540323 1 attach_detach_controller.go:540] processVolumesInUse for node "node-0-vm"
I0322 02:35:40.869608 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:35:41.014154 1 attach_detach_controller.go:540] processVolumesInUse for node "node-3-vm"
I0322 02:35:41.058731 1 reflector.go:392] pkg/controller/informers/factory.go:89: Watch close - *extensions.DaemonSet total 0 items received
I0322 02:35:42.935111 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:35:43.414353 1 reflector.go:392] pkg/controller/garbagecollector/garbagecollector.go:768: Watch close - <nil> total 0 items received
I0322 02:35:43.901573 1 attach_detach_controller.go:540] processVolumesInUse for node "master-0-vm"
I0322 02:35:44.478340 1 nodecontroller.go:713] Node master-0-vm ReadyCondition updated. Updating timestamp.
I0322 02:35:44.478613 1 nodecontroller.go:713] Node node-0-vm ReadyCondition updated. Updating timestamp.
I0322 02:35:44.478780 1 nodecontroller.go:713] Node node-3-vm ReadyCondition updated. Updating timestamp.
I0322 02:35:44.961529 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:35:46.988408 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:35:49.064564 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:35:51.103435 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:35:52.769445 1 reflector.go:273] pkg/controller/resourcequota/resource_quota_controller.go:232: forcing resync
I0322 02:35:53.178768 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:35:54.043888 1 reflector.go:273] pkg/controller/volume/persistentvolume/pv_controller_base.go:455: forcing resync
I0322 02:35:54.043958 1 reflector.go:273] pkg/controller/volume/persistentvolume/pv_controller_base.go:454: forcing resync
I0322 02:35:54.044629 1 reflector.go:273] pkg/controller/volume/persistentvolume/pv_controller_base.go:159: forcing resync
I0322 02:35:54.339298 1 reflector.go:392] pkg/controller/garbagecollector/garbagecollector.go:768: Watch close - <nil> total 0 items received
I0322 02:35:55.236004 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:35:56.013491 1 reflector.go:273] pkg/controller/replication/replication_controller.go:220: forcing resync
I0322 02:35:57.093825 1 reflector.go:273] pkg/controller/endpoint/endpoints_controller.go:164: forcing resync
I0322 02:35:57.098093 1 endpoints_controller.go:338] Finished syncing service "trendex/trendex" endpoints. (3.858837ms)
I0322 02:35:57.098333 1 endpoints_controller.go:338] Finished syncing service "worten-ac-simulator/worten-ac-simulator" endpoints. (4.338892ms)
I0322 02:35:57.104371 1 endpoints_controller.go:338] Finished syncing service "bitshopping/bitshopping" endpoints. (9.914215ms)
I0322 02:35:57.105145 1 endpoints_controller.go:338] Finished syncing service "tv-directory/tv-directory" endpoints. (10.493105ms)
I0322 02:35:57.106587 1 endpoints_controller.go:338] Finished syncing service "roamersapp/roamersapp" endpoints. (2.172777ms)
I0322 02:35:57.107370 1 endpoints_controller.go:338] Finished syncing service "kube-lego/kube-lego-nginx" endpoints. (12.568538ms)
I0322 02:35:57.114755 1 endpoints_controller.go:338] Finished syncing service "tvifttt/tviftttapp" endpoints. (8.124249ms)
I0322 02:35:57.115146 1 endpoints_controller.go:338] Finished syncing service "pixelscamp/pixelscamp" endpoints. (16.788809ms)
I0322 02:35:57.115471 1 endpoints_controller.go:338] Finished syncing service "kube-system/kubernetes-dashboard" endpoints. (10.285392ms)
I0322 02:35:57.115757 1 endpoints_controller.go:338] Finished syncing service "gobrpxio/gobrpxio" endpoints. (17.644516ms)
I0322 02:35:57.123492 1 endpoints_controller.go:338] Finished syncing service "tvifttt/tvifttt" endpoints. (15.700531ms)
I0322 02:35:57.123792 1 endpoints_controller.go:338] Finished syncing service "nginx-ingress/nginx" endpoints. (8.58929ms)
I0322 02:35:57.124067 1 endpoints_controller.go:338] Finished syncing service "raster/raster" endpoints. (8.272004ms)
I0322 02:35:57.125199 1 endpoints_controller.go:338] Finished syncing service "kube-system/kube-dns" endpoints. (10.372793ms)
I0322 02:35:57.125961 1 endpoints_controller.go:338] Finished syncing service "probelyapp-staging/probelyapp" endpoints. (10.440307ms)
I0322 02:35:57.126616 1 endpoints_controller.go:338] Finished syncing service "nginx-ingress/default-http-backend" endpoints. (3.044294ms)
I0322 02:35:57.126671 1 endpoints_controller.go:338] Finished syncing service "default/kubernetes" endpoints. (1.293µs)
I0322 02:35:57.127139 1 endpoints_controller.go:338] Finished syncing service "nosslack/nosslack" endpoints. (3.027092ms)
I0322 02:35:57.133203 1 endpoints_controller.go:338] Finished syncing service "raster/redis-master" endpoints. (9.374898ms)
I0322 02:35:57.133581 1 endpoints_controller.go:338] Finished syncing service "probelyapp/probelyapp" endpoints. (8.277871ms)
I0322 02:35:57.133920 1 endpoints_controller.go:338] Finished syncing service "cathode/cathode" endpoints. (7.910074ms)
I0322 02:35:57.265600 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:35:57.797416 1 gc_controller.go:175] GC'ing orphaned
I0322 02:35:57.797532 1 gc_controller.go:195] GC'ing unscheduled pods which are terminating.
I0322 02:35:59.345887 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:36:01.379773 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:36:03.463614 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:36:05.595912 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:36:06.176893 1 reflector.go:273] pkg/controller/resourcequota/resource_quota_controller.go:229: forcing resync
I0322 02:36:06.184813 1 resource_quota_controller.go:153] Resource quota controller queued all resource quota for full calculation of usage
I0322 02:36:06.485604 1 reflector.go:273] pkg/controller/namespace/namespace_controller.go:212: forcing resync
I0322 02:36:07.013565 1 reflector.go:273] pkg/controller/service/servicecontroller.go:174: forcing resync
I0322 02:36:07.246974 1 reflector.go:273] pkg/controller/disruption/disruption.go:326: forcing resync
I0322 02:36:07.247006 1 reflector.go:273] pkg/controller/podautoscaler/horizontal.go:133: forcing resync
I0322 02:36:07.247013 1 reflector.go:273] pkg/controller/disruption/disruption.go:324: forcing resync
I0322 02:36:07.565950 1 reflector.go:273] pkg/controller/petset/pet_set.go:148: forcing resync
I0322 02:36:07.734745 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:36:07.913692 1 reflector.go:273] pkg/controller/informers/factory.go:89: forcing resync
I0322 02:36:07.914049 1 deployment_controller.go:154] Updating deployment tvifttt-interface
I0322 02:36:07.914095 1 deployment_controller.go:154] Updating deployment nosslack
I0322 02:36:07.916338 1 deployment_controller.go:313] Finished syncing deployment "tvifttt/tvifttt-interface" (2.208096ms)
I0322 02:36:07.918029 1 deployment_controller.go:313] Finished syncing deployment "nosslack/nosslack" (1.633283ms)
I0322 02:36:07.918077 1 deployment_controller.go:154] Updating deployment tv-directory
I0322 02:36:07.918099 1 deployment_controller.go:154] Updating deployment kube-dns-v20
I0322 02:36:07.918904 1 deployment_controller.go:313] Finished syncing deployment "tv-directory/tv-directory" (785.707µs)
I0322 02:36:07.920108 1 deployment_controller.go:313] Finished syncing deployment "kube-system/kube-dns-v20" (1.15902ms)
I0322 02:36:07.920160 1 deployment_controller.go:154] Updating deployment pixelscamp
I0322 02:36:07.920183 1 deployment_controller.go:154] Updating deployment nosslackbot
I0322 02:36:07.920998 1 deployment_controller.go:313] Finished syncing deployment "pixelscamp/pixelscamp" (785.307µs)
I0322 02:36:07.927939 1 deployment_controller.go:313] Finished syncing deployment "nosslack/nosslackbot" (6.89615ms)
I0322 02:36:07.928370 1 deployment_controller.go:154] Updating deployment roamersapp
I0322 02:36:07.928412 1 deployment_controller.go:154] Updating deployment probelyapp
I0322 02:36:07.930166 1 deployment_controller.go:313] Finished syncing deployment "roamersapp/roamersapp" (1.732833ms)
I0322 02:36:07.931101 1 deployment_controller.go:313] Finished syncing deployment "probelyapp/probelyapp" (869.765µs)
I0322 02:36:07.931145 1 deployment_controller.go:154] Updating deployment tvifttt
I0322 02:36:07.931165 1 deployment_controller.go:154] Updating deployment cathode
I0322 02:36:07.958935 1 deployment_controller.go:313] Finished syncing deployment "cathode/cathode" (11.43338ms)
I0322 02:36:07.959052 1 deployment_controller.go:154] Updating deployment gustave
I0322 02:36:07.959138 1 deployment_controller.go:154] Updating deployment default-http-backend
I0322 02:36:07.967064 1 deployment_controller.go:313] Finished syncing deployment "gustave/gustave" (7.875261ms)
I0322 02:36:07.967862 1 deployment_controller.go:313] Finished syncing deployment "nginx-ingress/default-http-backend" (564.318µs)
I0322 02:36:07.967928 1 deployment_controller.go:154] Updating deployment redis-master
I0322 02:36:07.967951 1 deployment_controller.go:154] Updating deployment trendex
I0322 02:36:07.968577 1 deployment_controller.go:313] Finished syncing deployment "raster/redis-master" (593.303µs)
I0322 02:36:07.969519 1 deployment_controller.go:313] Finished syncing deployment "trendex/trendex" (897.351µs)
I0322 02:36:07.969568 1 deployment_controller.go:154] Updating deployment kubernetes-dashboard
I0322 02:36:07.969590 1 deployment_controller.go:154] Updating deployment tviftttapp
I0322 02:36:07.970190 1 deployment_controller.go:313] Finished syncing deployment "kube-system/kubernetes-dashboard" (580.909µs)
I0322 02:36:07.987186 1 deployment_controller.go:313] Finished syncing deployment "tvifttt/tviftttapp" (16.95312ms)
I0322 02:36:07.987283 1 deployment_controller.go:154] Updating deployment elpixel
I0322 02:36:07.987335 1 deployment_controller.go:154] Updating deployment nginx
I0322 02:36:07.988266 1 deployment_controller.go:313] Finished syncing deployment "elpixel/elpixel" (900.75µs)
I0322 02:36:07.988941 1 deployment_controller.go:313] Finished syncing deployment "nginx-ingress/nginx" (629.385µs)
I0322 02:36:07.988995 1 deployment_controller.go:154] Updating deployment gobrpxio
I0322 02:36:07.989014 1 deployment_controller.go:154] Updating deployment probelyapp
I0322 02:36:07.990094 1 deployment_controller.go:313] Finished syncing deployment "gobrpxio/gobrpxio" (1.063468ms)
I0322 02:36:08.005891 1 deployment_controller.go:313] Finished syncing deployment "probelyapp-staging/probelyapp" (15.75485ms)
I0322 02:36:08.005946 1 deployment_controller.go:154] Updating deployment tvifttt-delay
I0322 02:36:08.005969 1 deployment_controller.go:154] Updating deployment bitshopping
I0322 02:36:08.007987 1 deployment_controller.go:313] Finished syncing deployment "tvifttt/tvifttt-delay" (1.986384ms)
I0322 02:36:08.008999 1 deployment_controller.go:313] Finished syncing deployment "bitshopping/bitshopping" (964.6µs)
I0322 02:36:08.009048 1 deployment_controller.go:154] Updating deployment worten-ac-simulator
I0322 02:36:08.009070 1 deployment_controller.go:154] Updating deployment raster
I0322 02:36:08.016803 1 deployment_controller.go:313] Finished syncing deployment "worten-ac-simulator/worten-ac-simulator" (7.713193ms)
I0322 02:36:08.018476 1 deployment_controller.go:313] Finished syncing deployment "raster/raster" (1.619805ms)
I0322 02:36:08.018542 1 deployment_controller.go:154] Updating deployment kube-lego
I0322 02:36:08.019279 1 deployment_controller.go:313] Finished syncing deployment "kube-lego/kube-lego" (699.001µs)
I0322 02:36:08.047248 1 deployment_controller.go:313] Finished syncing deployment "tvifttt/tvifttt" (116.059963ms)
I0322 02:36:08.309727 1 reflector.go:273] pkg/controller/disruption/disruption.go:328: forcing resync
I0322 02:36:08.310259 1 reflector.go:273] pkg/controller/disruption/disruption.go:329: forcing resync
I0322 02:36:09.093723 1 reflector.go:273] pkg/controller/disruption/disruption.go:327: forcing resync
I0322 02:36:09.149389 1 reflector.go:273] pkg/controller/volume/persistentvolume/pv_controller_base.go:454: forcing resync
I0322 02:36:09.149436 1 reflector.go:273] pkg/controller/volume/persistentvolume/pv_controller_base.go:455: forcing resync
I0322 02:36:09.150118 1 reflector.go:273] pkg/controller/volume/persistentvolume/pv_controller_base.go:159: forcing resync
I0322 02:36:09.758741 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:36:10.898701 1 reflector.go:392] pkg/controller/volume/persistentvolume/pv_controller_base.go:455: Watch close - *api.PersistentVolumeClaim total 0 items received
I0322 02:36:11.521060 1 namespace_controller.go:206] Finished syncing namespace "default" (596ns)
I0322 02:36:11.521174 1 namespace_controller.go:206] Finished syncing namespace "roamersapp" (298ns)
I0322 02:36:11.521213 1 namespace_controller.go:206] Finished syncing namespace "worten-ac-simulator" (198ns)
I0322 02:36:11.521233 1 namespace_controller.go:206] Finished syncing namespace "gustave" (199ns)
I0322 02:36:11.521269 1 namespace_controller.go:206] Finished syncing namespace "kube-lego" (199ns)
I0322 02:36:11.521289 1 namespace_controller.go:206] Finished syncing namespace "tv-directory" (198ns)
I0322 02:36:11.521306 1 namespace_controller.go:206] Finished syncing namespace "pixelscamp" (198ns)
I0322 02:36:11.521322 1 namespace_controller.go:206] Finished syncing namespace "bitshopping" (198ns)
I0322 02:36:11.521356 1 namespace_controller.go:206] Finished syncing namespace "trendex" (100ns)
I0322 02:36:11.521375 1 namespace_controller.go:206] Finished syncing namespace "tvifttt" (199ns)
I0322 02:36:11.521391 1 namespace_controller.go:206] Finished syncing namespace "kube-system" (99ns)
I0322 02:36:11.521421 1 namespace_controller.go:206] Finished syncing namespace "elpixel" (199ns)
I0322 02:36:11.521439 1 namespace_controller.go:206] Finished syncing namespace "probelyapp-staging" (199ns)
I0322 02:36:11.521456 1 namespace_controller.go:206] Finished syncing namespace "nginx-ingress" (198ns)
I0322 02:36:11.521472 1 namespace_controller.go:206] Finished syncing namespace "raster" (199ns)
I0322 02:36:11.521501 1 namespace_controller.go:206] Finished syncing namespace "cathode" (199ns)
I0322 02:36:11.521518 1 namespace_controller.go:206] Finished syncing namespace "gobrpxio" (199ns)
I0322 02:36:11.521534 1 namespace_controller.go:206] Finished syncing namespace "probelyapp" (199ns)
I0322 02:36:11.521550 1 namespace_controller.go:206] Finished syncing namespace "nosslack" (199ns)
I0322 02:36:11.809629 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:36:13.899391 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:36:15.927454 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:36:17.371577 1 attach_detach_controller.go:540] processVolumesInUse for node "node-3-vm"
I0322 02:36:17.938641 1 gc_controller.go:175] GC'ing orphaned
I0322 02:36:17.938682 1 gc_controller.go:195] GC'ing unscheduled pods which are terminating.
I0322 02:36:17.985692 1 leaderelection.go:203] succesfully renewed lease kube-system/kube-controller-manager
I0322 02:36:18.106790 1 attach_detach_controller.go:540] processVolumesInUse for node "node-0-vm"
I0322 02:36:19.006722 1 reflector.go:392] pkg/controller/volume/persistentvolume/pv_controller_base.go:159: Watch close - *storage.StorageClass total 0 items received
I0322 02:36:19.525498 1 attach_detach_controller.go:540] processVolumesInUse for node "master-0-vm"
I0322 02:36:19.738858 1 nodecontroller.go:738] node node-4-vm hasn't been updated for 40.298744291s. Last ready condition is: {Type:Ready Status:True LastHeartbeatTime:2017-03-22 02:35:34 +0000 UTC LastTransitionTime:2017-03-20 09:52:57 +0000 UTC Reason:KubeletReady Message:kubelet is posting ready status}
I0322 02:36:19.738976 1 nodecontroller.go:765] node node-4-vm hasn't been updated for 40.29886588s. Last out of disk condition is: &{Type:OutOfDisk Status:False LastHeartbeatTime:2017-03-22 02:35:34 +0000 UTC LastTransitionTime:2017-03-20 09:52:57 +0000 UTC Reason:KubeletHasSufficientDisk Message:kubelet has sufficient disk space available}
Relevant example lines:
I0322 02:35:39.440085 1 nodecontroller.go:713] Node node-4-vm ReadyCondition updated. Updating timestamp.
I0322 02:36:19.738858 1 nodecontroller.go:738] node node-4-vm hasn't been updated for 40.298744291s. Last ready condition is: {Type:Ready Status:True LastHeartbeatTime:2017-03-22 02:35:34 +0000 UTC LastTransitionTime:2017-03-20 09:52:57 +0000 UTC Reason:KubeletReady Message:kubelet is posting ready status}
Node-4-vm kubelet:
Mar 22 02:35:40 node-4-vm kubelet-wrapper[996]: I0322 02:35:40.349624 996 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/99a2bd45-0da7-11e7-9a8a-000d3a2709aa-default-token-k195j" (spec.Name: "default-token-k195j") pod "99a2bd45-0da7-11e7-9a8a-000d3a2709aa" (UID: "99a2bd45-0da7-11e7-9a8a-000d3a2709aa").
Mar 22 02:35:40 node-4-vm kubelet-wrapper[996]: I0322 02:35:40.349755 996 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/d90c7bb4-0dab-11e7-9a8a-000d3a2709aa-default-token-2bkr0" (spec.Name: "default-token-2bkr0") pod "d90c7bb4-0dab-11e7-9a8a-000d3a2709aa" (UID: "d90c7bb4-0dab-11e7-9a8a-000d3a2709aa").
Mar 22 02:35:42 node-4-vm kubelet-wrapper[996]: I0322 02:35:42.356358 996 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/fdfc2d54-d0f2-11e6-b156-000d3a2709aa-default-token-7ffjg" (spec.Name: "default-token-7ffjg") pod "fdfc2d54-d0f2-11e6-b156-000d3a2709aa" (UID: "fdfc2d54-d0f2-11e6-b156-000d3a2709aa").
Mar 22 02:35:42 node-4-vm kubelet-wrapper[996]: I0322 02:35:42.358097 996 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/configmap/fdfc2d54-d0f2-11e6-b156-000d3a2709aa-config-volume" (spec.Name: "config-volume") pod "fdfc2d54-d0f2-11e6-b156-000d3a2709aa" (UID: "fdfc2d54-d0f2-11e6-b156-000d3a2709aa").
Mar 22 02:36:39 node-4-vm kubelet-wrapper[996]: E0322 02:36:39.085953 996 kubelet_node_status.go:302] Error updating node status, will retry: Operation cannot be fulfilled on nodes "node-4-vm": the object has been modified; please apply your changes to the latest version and try again
That kubelet error seems to indicate that it had been updating the node status correctly but someone else updated the value meanwhile at that time (the controller manager). I don’t see any other relevant info in the logs of the kubelets, api server, or etcd nodes. Since several nodes were marked as not ready at the same time, it appears to be something wrong on the controller manager itself.
I did not change the default values of 10s for kubelets to update their node status and 40s for the controller to wait for these updates.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 4
- Comments: 57 (22 by maintainers)
@edevil Not exclusive to Azure - we see it on AWS.
As discussed on https://github.com/Azure/acs-engine/issues/863#issuecomment-338576088, the bug is gone since Saturday night (France), anyone can confirm here ?
Can anyone tell me the good way to go around this issue ? So far I only see this guy telling the solution:
But I am not really sure about the negative effect of setting those two params.
@petergardfjall … sounds clearly an Azure problem, we have the same problem on the same Datacenter on multiple clusters, starting at the same time … Plz @colemickens @brendanburns do you have some updates or infos on this ?
I open a ticket on azure support right now. Stay tuned
In our case it was the problem of overloaded nodes and pod without limits:
So what worked for us - limits on every single pod, and no more issues like that.
Yes but this is seen in scale situations where the cloud provider isn’t a problem as well, right ? So it’s a general problem that it’s hard to reason about the heartbeat ; and definetly error handing for cloud providers is part of that
We have the same problem in Azure,
After setting Node Controller logs to V4, this messages start showing up on logs:
Reason:NodeStatusUnknown Message:Kubelet stopped posting node status.
All our pods are being killed,sometimes several times a day…