calico: Assigned ipBlocks are not released
Assigned blocks of IP addresses (etcd location /calico/ipam/v2/assignment/ipv4/block/<ip-block>) are not released even if those are not assigned to nodes.
Expected Behavior
When there are no blocks, assigned to nodes (block is not assigned to any node at /calico/ipam/v2/host/<host>/<block>), I expect them to be released from /calico/ipam/v2/assignment/ipv4/block/<ip-block>.
Current Behavior
A bunch of blocks, not assigned to nodes, are still kept in assignments.
Possible Solution
When the block is not assigned to the host - release it from assignments.
Steps to Reproduce (for bugs)
- Create Kubernetes cluster with calico as cni plugin.
- Create a bunch of deployments/replicasets/jobs, which will then create enough number of pods to have as many blocks assigned to hosts as possible. For this, we can set up small ipam size (e.g.
/27) with small subnet size (e.g./29).
Context
In the old cluster with ipam network size /18 and subnet block size /26 (default) we got into a situation, where there were 16 subnets, assigned to hosts(/calico/ipam/v2/host/<host>/<block>), but ~240 subnets in assignments (/calico/ipam/v2/assignment/). That lead is into the issue like
Warning FailedCreatePodSandBox 11m (x815 over 44h) kubelet, worker-x53vn-7f9964b764-2rlb6 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "495cffc4ae7e0e717a62ca01f8249a42928997e9d37e604e2bf2440d108ff0f4" network for pod "test-report-service-7c7f8486bd-qkbl8": NetworkPlugin cni failed to set up pod "test-report-service-7c7f8486bd-qkbl8_default" network: failed to request 1 IPv4 addresses. IPAM allocated only 0
After manual cleaning etcd of all the subnets, which were not assigned to hosts, issues were resolved. E.g.
# get ipblocks, marked as assigned
etcdctl get /calico/ipam/v2/assignment/ipv4/block --prefix --keys-only | grep block | awk -F "/" '{print $NF}' > assigned-by-blocks
# get ipblocks, actually used by nodes
etcdctl get /calico/ipam/v2/host/ --prefix --keys-only | grep block | awk -F "/" '{print $NF}' > assigned-by-nodes
# delete intersection from lists above
for block in `grep -Fvf assigned-by-hosts assigned-by-blocks`; do etcdctl del /calico/ipam/v2/assignment/ipv4/block/${block}; done
Your Environment
- Calico version:
3.7.2 - Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes 1.14.3
- Operating System and version: CoreOS 2191.0.0
- Link to your project (optional):
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 10
- Comments: 22 (14 by maintainers)
We having the same issue. We are using calico 3.3 on kubernetes 1.11.
The container even stuck at creation phase because these unrelease affinity. Our case it is because can not get IPAM host when calico-kube-controllers try to delete the node. https://github.com/projectcalico/libcalico-go/blob/4346117ce592eedcc83269c09fbc4a1e652d0b76/lib/ipam/ipam.go#L1081
After a while our etcd if full with this kind of data
{“cidr”:“100.100.10.0/26”,“affinity”:null,“strictAffinity”:false,“allocations”:[0,null,null,null,null,0,0,null,null,0,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,0,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],“unallocated”:[18,20,19,24,21,12,26,23,28,27,30,31,34,32,33,4,36,39,62,35,37,38,41,42,40,44,47,45,46,49,48,51,50,52,55,53,54,2,56,58,57,60,43,22,63,1,3,61,7,11,13,10,14,16,15,17,59,8,29],“attributes”:[{“handle_id”:null,“secondary”:null}]}
Here is the content from the unreleased block (there is no even information about the node) https://gist.github.com/corest/5863287f36f59ac80a36f57aad42b62a
All those handles are not cleaned up from
/calico/ipam/v2/handle/<handle-id>.