operator-sdk: Resources are sometimes manipulated with the wrong API group

Bug Report

I have a Helm operator that installs releases in multiple namespaces in my K8s cluster. It is working mostly fine, however sometimes, seemingly at random, the release fails. I can see that the operator logged the error below.

It seems that the operator is trying to get the correct resource, but from the wrong API group. I don’t know how it could happen, but it seems it is sometimes confusing API groups between resources.

In the example below, the Helm chart that is getting installed has only 2 resources:

  • A Deployment in API group apps
  • A ConfigMap in API group ""

Sometimes, at random, the operator will try to manipulate either a Deployment in API group "" or a ConfigMap in API group apps. This fails the release, as Helm tries to manipulate resources that do not exist. When the release is tried again, it might fail again (a different resource might be the problem) or it might succeed.

Eventually, all resources are properly reconciled. The impact of this is that the reconciliation takes significantly more time.

What did you do?

  • Define a Helm chart with 2 resources
  • Use the operator SDK Helm operator to reconcile Helm releases in multiple namespaces
  • Check the operator pod logs

What did you expect to see?

The Helm releases are reconciled successfully with no errors.

What did you see instead? Under which circumstances?

The following errors appear:

could not get object: configmaps.apps "tenant-50139-xpodbridge" is forbidden: User "system:serviceaccount:xpod-op:manager" cannot get resource "configmaps" in API group "apps" in the namespace "tenant-50139"
could not get object: deployments "xpbridge" is forbidden: User "system:serviceaccount:xpod-op:manager" cannot get resource "deployments" in API group "" in the namespace "tenant-50262"

Environment

Operator type:

/language helm

Kubernetes cluster type:

Google Kubernetes Engine

$ operator-sdk version

"v1.26.0", commit: "cbeec475e4612e19f1047ff7014342afe93f60d2", kubernetes version: "1.25.0", go version: "go1.19.3", GOOS: "linux", GOARCH: "amd64"

Docker image: quay.io/operator-framework/helm-operator:v1.26.0

(Note that this also happens with operator-sdk 1.19.1.)

$ kubectl version

Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.5-gke.600", GitCommit:"fb4964ee848bc4d25d42d60386c731836059d1d8", GitTreeState:"clean", BuildDate:"2022-09-22T09:24:55Z", GoVersion:"go1.18.6b7", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution

  • The randomness seems to point to a race condition
  • The issue could be related to Helm, or also to the K8s go client, I’m not sure.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 15 (5 by maintainers)

Most upvoted comments

@pjestin-sym thanks for your analysis! I apologize for my delay in getting around to investigating this further, I just haven’t had the time to take a deeper look. I am planning to carve out some time over the next couple days to take a deeper dive into this and some other open issues. I appreciate your patience with this!