kubernetes: Service endpoints status is wrong / not updated
What happened: We had a major network issue on our cluster where nodes were not able to contact each others and master(api,scheduler). Services endpoint were cleaned but when the network was back online some services endpoints remains blank while there were pods running for the services, i guess the controller should find these pods and update the service ep accordingly but it was not the case - our solution was to restart pods and the service ep was updated. We are running 1.15.0
What you expected to happen: Service endpoint is updated with PODs IP with correct selectors when a node comes back online after being unreachable
Anything else we need to know?:
kubectl describe ep islanding-redisha-slave -n ee
Name: islanding-redisha-slave
Namespace: ee
Labels: app=islanding-redisha
chart=islanding-redisha-3.6.0
heritage=Tiller
release=islanding-redisha
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2019-09-09T11:08:28Z
Subsets:
Addresses: <none>
NotReadyAddresses: 10.233.26.26
Ports:
Name Port Protocol
---- ---- --------
redis 6379 TCP
Events: <none>
kubectl get po -n ee -owide|grep slave
islanding-redisha-slave-869b4c64c9-7ng49 1/1 Running 0 4d15h 10.233.26.26 dzr-k8s-10 <none> <none>
It’s listed as NotReady while the PODs is running fine. I still have the case on that PODs/Service for investigation if you need logs. Thanks
Environment:
- Kubernetes version (use
kubectl version
): 1.15.0 - Cloud provider or hardware configuration: baremetal
- OS (e.g:
cat /etc/os-release
): debian stretch - Kernel (e.g.
uname -a
): Linux 4.9.0-7-amd64 #1 SMP Debian 4.9.110-1 (2018-07-05) x86_64 GNU/Linux - Install tools:
- Network plugin and version (if this is a network-related bug): flannel 0.11
- Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 7
- Comments: 25 (17 by maintainers)
We just hit this because of a small time period of network instability and it took down a zone. I think it’s related to the issue mentioned in https://github.com/kubernetes/kubernetes/pull/17741#issuecomment-161129024 when the feature to mark pods as not ready when the node is no longer ready was introduced .
Basically Kubelet expects conditions to be managed by itself, but the NodeController from the control plane marks all pods as not ready once the node times out (by default after 40s). If the network is down long enough, all pods will be rescheduled and thus their conditions are no longer relevant. But if not they are broken since Kubelet would need to resync the state, but Kubelet doesn’t know about the external change. This was supposedly fixed in https://github.com/kubernetes/kubernetes/pull/18410 back in '16, but I suspect that that fix was not complete.
It’s very hard to follow the path that Pod updates take in kubelet, so for the time being I had to stop debugging there.
There is https://github.com/kubernetes/kubernetes/pull/83455 as a possible solution, but I think just resyncing because the last state update is too old is a bit ugly. IMO it would be better to have Kubelet use its watcher for its pods to be notified about state changes and have the reconcilation evaluate the difference between the watcher cache and the expected state instead of using its internal view (which doesn’t look like it gets the state update from the control plane). This would incur minimal overhead and react quickly.