calico: Assigned ipBlocks are not released

Assigned blocks of IP addresses (etcd location /calico/ipam/v2/assignment/ipv4/block/<ip-block>) are not released even if those are not assigned to nodes.

Expected Behavior

When there are no blocks, assigned to nodes (block is not assigned to any node at /calico/ipam/v2/host/<host>/<block>), I expect them to be released from /calico/ipam/v2/assignment/ipv4/block/<ip-block>.

Current Behavior

A bunch of blocks, not assigned to nodes, are still kept in assignments.

Possible Solution

When the block is not assigned to the host - release it from assignments.

Steps to Reproduce (for bugs)

  1. Create Kubernetes cluster with calico as cni plugin.
  2. Create a bunch of deployments/replicasets/jobs, which will then create enough number of pods to have as many blocks assigned to hosts as possible. For this, we can set up small ipam size (e.g. /27) with small subnet size (e.g. /29).

Context

In the old cluster with ipam network size /18 and subnet block size /26 (default) we got into a situation, where there were 16 subnets, assigned to hosts(/calico/ipam/v2/host/<host>/<block>), but ~240 subnets in assignments (/calico/ipam/v2/assignment/). That lead is into the issue like

  Warning  FailedCreatePodSandBox  11m (x815 over 44h)    kubelet, worker-x53vn-7f9964b764-2rlb6  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "495cffc4ae7e0e717a62ca01f8249a42928997e9d37e604e2bf2440d108ff0f4" network for pod "test-report-service-7c7f8486bd-qkbl8": NetworkPlugin cni failed to set up pod "test-report-service-7c7f8486bd-qkbl8_default" network: failed to request 1 IPv4 addresses. IPAM allocated only 0

After manual cleaning etcd of all the subnets, which were not assigned to hosts, issues were resolved. E.g.

# get ipblocks, marked as assigned
etcdctl get /calico/ipam/v2/assignment/ipv4/block --prefix --keys-only | grep block | awk -F "/" '{print $NF}' > assigned-by-blocks
# get ipblocks, actually used by nodes
etcdctl get /calico/ipam/v2/host/ --prefix --keys-only | grep block | awk -F "/" '{print $NF}' > assigned-by-nodes
# delete intersection from lists above
for block in `grep -Fvf assigned-by-hosts assigned-by-blocks`; do etcdctl del /calico/ipam/v2/assignment/ipv4/block/${block}; done

Your Environment

  • Calico version: 3.7.2
  • Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes 1.14.3
  • Operating System and version: CoreOS 2191.0.0
  • Link to your project (optional):

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 10
  • Comments: 22 (14 by maintainers)

Most upvoted comments

We having the same issue. We are using calico 3.3 on kubernetes 1.11.

The container even stuck at creation phase because these unrelease affinity. Our case it is because can not get IPAM host when calico-kube-controllers try to delete the node. https://github.com/projectcalico/libcalico-go/blob/4346117ce592eedcc83269c09fbc4a1e652d0b76/lib/ipam/ipam.go#L1081

After a while our etcd if full with this kind of data

{“cidr”:“100.100.10.0/26”,“affinity”:null,“strictAffinity”:false,“allocations”:[0,null,null,null,null,0,0,null,null,0,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,0,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],“unallocated”:[18,20,19,24,21,12,26,23,28,27,30,31,34,32,33,4,36,39,62,35,37,38,41,42,40,44,47,45,46,49,48,51,50,52,55,53,54,2,56,58,57,60,43,22,63,1,3,61,7,11,13,10,14,16,15,17,59,8,29],“attributes”:[{“handle_id”:null,“secondary”:null}]}

Here is the content from the unreleased block (there is no even information about the node) https://gist.github.com/corest/5863287f36f59ac80a36f57aad42b62a

All those handles are not cleaned up from /calico/ipam/v2/handle/<handle-id>.