calico: Cannot upgrade to v3.23.2

With a typha based setup, upgrading to calico 3.23.2 from 3.22.1 is not possible (at least not without a degradation of the CNI).

Expected Behavior

  1. Upgrade everything except calico-node to 3.23.2
  2. Cluster is healthy and calico-node 3.22.1 continues to function

Current Behavior

  1. Upgrade everything except calico-node (crds, calico-typha, etc) to 3.23.2
  2. Calico-node goes unready
  3. Networking for newly created pods is not functional

Speculation

Looking a bit into this it seems part of the problem might be that there might be some incompatible changes in libcalico-go such that typha and calico-node have different views of the world and what is valid. For example here is some log lines from calico-node 3.22.1 after the calico-typha upgrade:

    2022-06-28 04:19:15.280 [ERROR][133] felix/sync_proto.go 302: BUG: cannot parse key. key="/calico/resources/v3/projectcalico.org/kubernetesendpointslices/kube-applier-8t5xd"

This looks like a potential bug where endpoint slices are no longer being treated as namespaced, potentially fixed with the following:

diff --git a/libcalico-go/lib/namespace/resource.go b/libcalico-go/lib/namespace/resource.go
index ebe909024..79e1fade8 100644
--- a/libcalico-go/lib/namespace/resource.go
+++ b/libcalico-go/lib/namespace/resource.go
@@ -39,6 +39,7 @@ func IsNamespaced(kind string) bool {
        case KindKubernetesEndpointSlice:
                // KindKubernetesEndpointSlice is a special-case resource. We don't expose it over the
                // v3 API, but it is used in the felix syncer.
+               return true
        case KindKubernetesService:
                return true
        }

However it also looks like KubernetesServices were changed to namespaced so also are not parsed in calico-node 3.22.1 when interacting with typha 3.23.2: https://github.com/projectcalico/calico/pull/5813/files#diff-0c1fa0f118bec26553d1dbbeb19112f956b9139f8b005d57cfd62b3cd4945c35

There may be more at play here, as I had mainly gravitated to the BUG log lines…

Steps to Reproduce (for bugs)

See expected behavior above.

Context

  • Running with typha
  • Upgrade from 3.22.1 to 3.23.2

P.S. Happy to dig up extra details if necessary, but figured this was worth posting sooner rather than later.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 16 (11 by maintainers)

Most upvoted comments

Removing the defaulting is actually all that should be needed, e.g., this PR: https://github.com/projectcalico/calico/pull/6415

Gotcha, OK. I’ll do some more investigation on this then.