system-upgrade-controller: Plan not found

Version 0.9.1

Platform/Architecture linux-amd64

Describe the bug I am running rke2 and installed the controller according to the docs: https://docs.rke2.io/upgrade/automated_upgrade/ I did try setting a channel but am falling back to setting a version now. i labeled the node rke2-upgrade: “true” i am running rke2 v1.23.5+rke2r1 and am trying to upgrade to v1.23.6+rke2r1 i have 1 master node and 2 agent nodes

the logoutput:

time="2022-04-29T11:00:16Z" level=debug msg="PLAN STATUS HANDLER: plan=system-upgrade/server-plan@4670979, status={Conditions:[] LatestVersion: LatestHash: Applying:[]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-04-29T11:00:16Z" level=debug msg="PLAN GENERATING HANDLER: plan=system-upgrade/server-plan@4670979, status={Conditions:[{Type:LatestResolved Status:True LastUpdateTime:2022-04-29T11:00:16Z LastTransitionTime: Reason:Version Message:}] LatestVersion:v1.23.6-rke2r1 LatestHash:d5c507264ef83171925233cdeebe1f8a21614263918690682ef945fe Applying:[]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-04-29T11:00:16Z" level=debug msg="DesiredSet - Created batch/v1, Kind=Job system-upgrade/apply-server-plan-on-decgn-pr-vmcq-k8s-with-d5c507264ef83-d9397 for system-upgrade-controller system-upgrade/server-plan" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-04-29T11:00:16Z" level=error msg="error syncing 'system-upgrade/server-plan': handler system-upgrade-controller: plans.upgrade.cattle.io \"server-plan\" not found, handler system-upgrade-controller: plans.upgrade.cattle.io \"server-plan\" not found, requeuing" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"

watching jobs in the namespace there briefly appears one and immediatly disappears:

system-upgrade   apply-server-plan-on-decgn-pr-vmcq-k8s-with-d5c507264ef83-d9397   0/1                      0s

one additional note, i am using fluxcd with the cluster though i am not sure if this is interfering here since it doesnt touch the status fields and no status is ever set on the plans nor any events appear…

To Reproduce setup rke2 cluster apply controller apply plans

Expected behavior successful node upgrades

Actual behavior even initializing the upgrade fails

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 19 (5 by maintainers)

Most upvoted comments

@dweomer i mean my PR should fix the issue now, although when there are mutliple plan crd versions that might change, its just that its kinda redundant to have both ways apply the crds, as long as both use the same source it shouldnt be an issue i guess

@texasbobs did you see how i am deploying the SUC and CRD? https://github.com/rancher/system-upgrade-controller/issues/203#issuecomment-1187460978

i am fairly sure i didnt come up with that myself, but thats how its working perfectly fine using the crd provided with the release 😃

To clarify my steps…

  1. Clean K3S deployment that has never seen system-upgrade-controller.
  2. ArgoCD deploys, everything is sync and looks good until you look at the logs which look like:
W0721 03:03:18.522839       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2022-07-21T03:03:18Z" level=info msg="Applying CRD plans.upgrade.cattle.io"
time="2022-07-21T03:03:19Z" level=info msg="Starting /v1, Kind=Node controller"
time="2022-07-21T03:03:19Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2022-07-21T03:03:19Z" level=info msg="Starting batch/v1, Kind=Job controller"
time="2022-07-21T03:03:19Z" level=info msg="Starting upgrade.cattle.io/v1, Kind=Plan controller"
time="2022-07-21T03:03:28Z" level=error msg="error syncing 'system-upgrade/k3s-control-node-plan': handler system-upgrade-controller: plans.upgrade.cattle.io \"k3s-control-node-plan\" not found, handler system-upgrade-controller: plans.upgrade.cattle.io \"k3s-control-node-plan\" not found, requeuing"
time="2022-07-21T03:03:28Z" level=error msg="error syncing 'system-upgrade/k3s-control-node-plan': handler system-upgrade-controller: plans.upgrade.cattle.io \"k3s-control-node-plan\" not found, handler system-upgrade-controller: plans.upgrade.cattle.io \"k3s-control-node-plan\" not found, requeuing"
time="2022-07-21T03:03:28Z" level=error msg="error syncing 'system-upgrade/k3s-control-node-plan': handler system-upgrade-controller: plans.upgrade.cattle.io \"k3s-control-node-plan\" not found, handler system-upgrade-controller: plans.upgrade.cattle.io \"k3s-control-node-plan\" not found, requeuing"
  1. I disable self-heal in ArgoCD (stop it from enforcing GitHub source of truth for this application)
  2. I delete the system-upgrade-controller pod since its not working.
  3. Upon it being restarted, the CRD it applies is different than the previous one used as shown above.
  4. The pod logs are now clean:
W0721 03:10:21.075999       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2022-07-21T03:10:21Z" level=info msg="Applying CRD plans.upgrade.cattle.io"
time="2022-07-21T03:10:21Z" level=info msg="Starting /v1, Kind=Node controller"
time="2022-07-21T03:10:21Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2022-07-21T03:10:21Z" level=info msg="Starting batch/v1, Kind=Job controller"
time="2022-07-21T03:10:21Z" level=info msg="Starting upgrade.cattle.io/v1, Kind=Plan controller"

If I look at the CRD it applied, it is missing the lines shown in the diff from my previous post:

spec:
  conversion:
    strategy: None
  group: upgrade.cattle.io
  names:
    categories:
    - upgrade
    kind: Plan
    listKind: PlanList
    plural: plans
    singular: plan
  scope: Namespaced
  versions:
  - name: v1
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              channel:
                nullable: true
                type: string
              concurrency:
                type: integer
              cordon:
                type: boolean

Clearly a difference between the two CRDs.

It is not related to fluxcd. I apply plans manually and the issues still existed.

After using system-upgrade-controller for about a year, it is far from stable. Very disappointed.