kustomize: Kustomize resource ordering regression

Describe the bug

In Kubeflow, we are using Kustomize v3.2.0 and want to upgrade to v4.0.5: https://github.com/kubeflow/manifests/issues/1797 However, our deployment failed for v4.0.5, while it succeeded for v3.2.0.

This is what happened:

  1. In later kustomize versions, it orders Admission Webhooks last. Before, they were first.
  2. Pods are created before the istio injection webhook, thus they don’t get a sidecar.
  3. Our apps fail because they haven’t been mutated appropriately.

Regression Background

For reference, here is the original issue and PRs that made these changes: https://github.com/kubernetes-sigs/kustomize/issues/821 https://github.com/kubernetes-sigs/kustomize/pull/1104 https://github.com/kubernetes-sigs/kustomize/pull/2459

Issue #821 presents the following scenario, which led to PR #1104:

  1. User builds cert-manager kustomization, containing a webhook, a deployment and a CR (simplified).
  2. Apply webhook and deployment.
  3. Apply CR.

Step (3) fails because the Deployment has not become ready yet. The solution SHOULD be to retry the apply. PR #1104 solution was to order the webhook last, so that it doesn’t mutate/validate the CR. This is false, as it circumvents logic that the application has explicitly declared should be applied to all relevant resources.

Files that can reproduce the issue

Please see: https://github.com/kubeflow/manifests/blob/v1.3-branch/README.md which includes the example kustomization we use for Kubeflow components.

  • Build the example kustomization with kustomize v3.2.0, as per the README.
  • Build the example kustomization with kustomize v4.0.5. You will see the WebhookConfigurations ordered last, which causes the issues.

Proposed Solution

Restore the order of Mutating / Validating Webhooks as it was before PR #1104

Kustomize version

v4.0.5

cc’ing authors of the referenced issues and PRs: @donbowman @mgoltzsche @asadali cc @monopole @Shell32-Natsu @pwittrock

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (21 by maintainers)

Most upvoted comments

the original motivation for this has become somewhat moot since the community has moved away from declarative things and into imperative. e.g. istio no longer supports yaml prometheus et al operators now create their crd dynamically.

this leads to a race condition, you cannot apply a system w/ a yaml doc to install prometheus followed by one that sets up a servicemonitor on some other thing. No ordering can fix this (since the order is correct). and, the system will not reconcile to the desire state (since the servicemonitor is rejected as an unknown type).

What is the solution? I dunno, does one call until kustomize; sleep 1; done ?

Did an issue for option 2 get filed?

ack, thanks.

Here’s a list of ordering changes over the last ~2 years

Mostly insertions (i.e. specifying order for a particular resource where there wasn’t an order before).

  • 18 Sep 2020 #3009 put endpoints before service
  • 05 Aug 2020 #2796 Order PersistentVolume before Deployment
  • 12 May 2020 #2459 Relative Order of Mutating/Validating WebhookConfigs
  • 20 Aug 2019 #1437 add PriorityClass to the order list
  • 18 July 2019 #1369 add ResourceQuota to the order list
  • 23 May 2019 #1104 Order ValidatingWebhookConfig last
  • 15 May 2019 #1074 Update order of resources to include psps
  • 20 May 2019 #1037 Apply LimitRange resources before workloads

MutatingWebhookConfiguration was moved from orderFirst to orderLast about a year ago in #2459 ValidatingWebhookConfiguration was moved to orderLast about two years ago in #1104

#3803 would move them back to orderFirst.

It’s been pretty stable for a year, so making a change now may have consequences for many users. As discussed above, it’s impossible for kustomize to own the answer to ‘correct’ ordering. So let’s close #3803, and leave legacy alone.

Some other options are:

  1. kustomize honors --reorder none to suppress this sort, and obey FIFO. This works right now.
  2. we make the existing legacy sorter accept configuration from the kustomization file. user could specify orderFirst and orderLast in that file (would need someone to write the PR for this).
  3. user can write a sorting transformer plugin, and add that to their kustomization file as the last transformer to run (this is possible now, via various mechanisms - exec plugin, containerized plugin - but is the most work and slowest to execute).

If I understand correctly, the current (default) promise is: don’t care about ordering, kustomize will do it for you.

Where is that promise made? It must be retracted. It’s literally impossible right now to make one ordering that can both deploy a working stack to a cluster and universally works for everyone’s custom resources now and in the future.

Generally one needs a grander mechanism which can apply some things, waits for ready, apply more things, etc. Various things are moving in this direction, and would encapsulate kustomize or some other editor.

Thanks for the input @monopole! About:

People can order their inputs to get the desired output order, which may be the only answer. There’s no one ordering that makes sense for everyone, since no ordering can take custom resources into account.

This may indeed be the case, but it’s a big change, since it effectively changes the current promise of kustomize to manifests developers. If I understand correctly, the current (default) promise is: don’t care about ordering, kustomize will do it for you.

@mgoltzsche, @yanniszark do you agree? Someone will come along later and change it again. I’d like to make none the default value of --reorder, but it would break a bunch of people, and they’d have to reorder their configs. We could deprecate it over the course of a year or so. WDYT?

May I suggest the following:

  • Keep this issue for fixing the regression in legacy ordering. I believe the original reason for ordering the webhooks last was not actually a valid reason, as described in the first comment. Webhooks should be first, to ensure they can validate/mutate subsequent resources. I have created PR https://github.com/kubernetes-sigs/kustomize/pull/3803 for this reason.
  • Continue the more long-term discussion on what is the best way forward long-term for kustomize, in terms of reordering resources, in #821

Thanks! Discussing this on the earliest issue you mentioned #821

People can order their inputs to get the desired output order, which may be the only answer.

There’s no one ordering that makes sense for everyone, since no ordering can take custom resources into account.