external-secrets: Upgrading from v1alpha1 to v1beta1 via Flux Kustomization causes dry-run failure warnings

Upgrading ExternalSecrets of apiVersion v1alpha1 into v1beta1 via flux2 Kustomization object would cause repeating warnings reported by Flux kustomization controller, which also cause Kustomization reconciliation to randomly fail:

Warning ReconciliationFailed 5m16s (x14 over 85m) kustomize-controller ExternalSecret/some-namespace/some-name dry-run failed, error: failed to prune fields: failed add back owned items: failed to convert pruned object at version external-secrets.io/v1beta1: conversion webhook for external-secrets.io/v1alpha1, Kind=ExternalSecret returned invalid metadata: invalid metadata of type <nil> in input object

If you wait long enough, Kustomization reconciliation will eventually succeed, and ExternalSecret objects will be upgraded to v1beta1, but Kustomization object will still keep to get those warnings endlessly, and Kustomization will randomly become degraded because of that.

If v1beta1 ExternalSecret is deployed from scratch, there are no problems.

Reproduction steps:

  1. Deploy external-secrets v0.3.10
  2. Deploy ExternalSecret object with apiVersion v1alpha1 using Flux2 Kustomization (must contain wait: true field in spec to enable health checks)
  3. Upgrade external-secrets to v0.5.3
  4. Change ExternalSecret object manifest in Flux source to v1beta1
  5. Wait for some time (exact time may vary) and check events of Kustomization object

Observations (Constraints, Context, etc):

Kubernetes 1.22 (EKS) external-secrets v0.5.3 flux2 v0.30.2

Note: Flux2 kustomize-controller performs server-side apply when handling objects.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 3
  • Comments: 20 (6 by maintainers)

Most upvoted comments

Hi All,

For anyone else experiencing this issue I thought i’d add some information on how I was able to solve this. The issue I found was the managed fields on objects (after upgrading) had ownership of fields for both v1alpha1 and v1beta1. Removing the v1alpha1 field ownership fixed the problem.

To fix it I had to pull out the objects with their managed fields via kubectl, remove the offending block, then reapply:

kubectl get externalsecret/xyz -n namespace --show-managed-fields -o yaml > doc.yaml
... remove managed fields ...
kubectl apply -f ./doc.yaml

The block to remove looks like this:

  - apiVersion: external-secrets.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:conditions: {}

and is under .metadata.managedFields

The cluster is 1.23.6. But I am not sure it is directly related the external-secrets conversion webhook - I have added some logging of request/response to the webhook and so far I have found nothing out of the ordinary there. Running the kubectl command above it occurs like 1 out of 4 or 5 times. I am considering trying to set up a kind based cluster to see if it is reproducible there, as a lot of calls goes to the webhook in our test cluster so it can be hard to find the correct logs.

The error is originating here: https://github.com/kubernetes/apiextensions-apiserver/blob/master/pkg/apiserver/conversion/webhook_converter.go#L388 Through the logic in here: https://github.com/kubernetes-sigs/structured-merge-diff/blob/master/merge/update.go#L233

But since I cannot see anything suspicious in the logged responses from the external-secrets webhook I am leaning towards that this might be something a bit more general - maybe “dry-runs” are not done that often in combination with a conversion. Of course we will eventually convert the resources to v1beta1 but there will be some period for the migration and since this seems “random” eventually Flux probably get it reconciled eventually. But it would be nice to know the real cause of this 😃

FYI: https://gist.github.com/langecode/1d42bf97cac71c0c664b530c30902251 Changed webhook.go around line 96-108 and again at 147-152 - not for production use, of course, but it got the request/response payload out - and also added logs in the actual conversion functions similar to the image you (@moolen ) have built.

Still looking into this but just for the record I found a way to mimic the dry-run from the Flux Kustomization controller - thus reproducing the problem. Doing an apply along the lines of the following will, at least in my environment, periodically give same error as seen by Flux.

kubectl apply --server-side --validate=true --dry-run=server --field-manager=kustomize-controller -f external-secret.yaml

(the specification of the field-manager is only to simulate the Flux kustomize controller of course)

We also have this issue. All of our manifests are v1beta1 and we have deleted all ExternalSecret resources in our clusters (so Flux could recreate them from scratch). Deleting the resources helped for a while but this error keeps coming back

Using v0.5.6