kubernetes: Endpoints doesn't reconcile in some cases from instance that is shutting down
What happened?
Hi,
At the time of shutdown process of an control-plane instance, apiserver attemps to remove it’s master Lease-object
from etcd, and it may not find the lease-object in storage if the leases already expire by the time APIServer attempts to delete it from storage here , in such cases it will error out here with following error depending on the k8s version it’s on
controller.go:184] StorageError: key not found, Code: 1, Key: /registry/masterleases//172.18.100.37,
because of this line here errors out to delete from storage
And endpoint won’t be reconciled during the shutdown process from behind the Kubernetes service backend. Due to this some clients may still attempt connecting to old instance for some period of time as kube-proxy won’t update the endpoints until new controller reconciler loop from another instances kicks off and removes the old endpoint.
Example of connection refused
errors that clients experience on client applications because of the above issue
no more retries error: unable to recognize \""/tmp/manifest.yaml\"": Get \""https://10.100.0.1:443/api?timeout=32s\"": dial tcp 10.100.0.1:443: connect: connection refused"",""object"":{""apiVersion"":""v1"",""count"":1,""eventTime"":null,""firstTimestamp"":""2022-11-09T22:25:21Z"",""involvedObject"":
What did you expect to happen?
Given there can be potential cases where our shutdown can take longer or don’t end gracefully, our shutdown process can be more resilient and tolerate errors from storage and continue with reconciling anyway IIUC, so I propose a fix here , basically swallow/log this error when you have issues deleting a lease-object from storage and continue with reconciling to update endpoint object.
^^^ what this gives us is, incases when there was no expiry of master-leaseobject
from etcd
, it will be a no-op and continue to work the way it is i.e remove it from storage successfully and then reconcile. But in cases of master-lease object
from etcd expires, it will fail to find the key in etcd like in this case but because of this potential fix(swallowing the error and logging it), it will continue to reconcile the endpoints and this code here will have updated master endpoints from etcd anyway at the time of reconciliation and help keep endpoints behind K8s service up-to-date during the shutdown process and doesn’t have to wait until next periodic reconciler/new instance controller reconciler loop kicks in.
If you like above proposed solution, I can submit a PR for this. Please let me know. Thank you.
How can we reproduce it (as minimally and precisely as possible)?
- Let clients continue to make new connections every sec to instance you are about to terminate
- Delay the shutdown process until
master-lease
object in etcd expires for a given instance - Let APIServer continue with shutdown down process so it fails with the error
controller.go:184] StorageError: key not found, Code: 1, Key: /registry/masterleases//172.18.100.37,
- Check error in client logs saying
connection refused
because it was trying to connect to old instance (tcp dump may help here to help look at it clearly that it was making a connection to IP/endpoint of old instance and failing becauseconnection refused
error at network layer.
Anything else we need to know?
No response
Kubernetes version
But i think this may happen on any version as this part of the code hasn’t been updated on master IIUC
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (16 by maintainers)
actually all the credit must go to @hakuna-matatah , great report and he was the one identifying the PR and the double slash problem
🤔 the others apiserver will recycle the endpoints too, and 5 seconds to expire the lease meanwhile you clean the endpoints sounds too much… I don’t know I don’t like the idea of having an apiserver with network problems rewriting the endpoints…
Maybe I’m very cautious with these things, I would not be in favor of doing changes like this without a test reproducing the problem or showing the improvements …
I will defer to others, I expressed my opinion but it will be good to hear others