cilium: endpoint regeneration stuck on key allocation
Is there an existing issue for this?
- I have searched the existing issues
What happened?
Endpoints get stuck in waiting-to-regenerate because they cannot allocate a key.
Restarting cilium agent fixes the issue.
Cilium Version
1.12.3 1c466d2 2022-10-12T11:33:37+01:00 go version go1.18.6 linux/amd64
Kernel Version
5.15.0-1017-aws
Kubernetes Version
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.13-eks-15b7512", GitCommit:"94138dfbea757d7aaf3b205419578ef186dd5efb", GitTreeState:"clean", BuildDate:"2022-08-31T19:15:48Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Sysdump
No response
Relevant log output
level=warning msg="Key allocation attempt failed" attempt=0 error="unable to allocate ID 70517 for key [..labelSet...]: ciliumidentities.cilium.io \"70517\" already exists" key="[
...labelSet...]" subsys=allocator
level=warning msg="Key allocation attempt failed" attempt=1 error="slave key creation failed '...labelSet...': identity (id:\"96965\",key:\"[...labelSet...]\") does not exist" key="[...labelSet...]" subsys=allocator
### Anything else?
_No response_
### Code of Conduct
- [X] I agree to follow this project's Code of Conduct
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (11 by maintainers)
I’m wondering if the collection of deadlocks that were fixed in the past few Cilium v1.12.x releases (1.12.8 and 1.12.9) reared their heads in this issue and manifested in a different way. Anyway, worth trying out the later versions and see if this issue still exists. A gops stack dump would be useful if it occurs again.
Though checking now, we do get a steady stream of:
level=warning msg="Key allocation attempt failed" attempt=0 error="slave key creation failedJust the stuck endpoints part doesn’t happen too often.
I can try. It may take a bit because it doesn’t happen often and eventually fixes itself, so I have to catch it.