system-upgrade-controller: Spawned jobs use broken (truncated) image names

Version system-upgrade-controller version v0.9.1 (79da9f0)

Platform/Architecture linux-amd64

Describe the bug The jobs spawned by SUC use an image name truncated at 64 characters. This leads to Kubernetes not being able to run the jobs since these images simply do not exist.

To Reproduce We experienced the bug with the following Plan. Please note that I cannot post company internal image and plan names so I had to replace them. Either way, the images are not publicly available. Please also note that we have configured a registry mirror to avoid having to specify an explicit registry. The key point for reproduction is that the image name must be longer than 64 characters. In our case, the image names are that long because we are using a deeply nested structure in GitLab to organize our code base.

apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: xxxxxx
spec:
  nodeSelector:
    matchExpressions:
      - key: node-role.kubernetes.io/master
        operator: In
        values:
          - "true"
  upgrade:
    image: aa/bbbb/ccccccccccccccccc/dddddddddddddddddd/eeeeeeeeee/xxxxxx/master
    command: ["/usr/local/bin/bash"]
    args: ["/root/install.sh"]
  concurrency: 1
  version: "981a1f90cfcad63ea864957f996cde9b2e99e587"

SUC will pick up this plan and spawn a job to execute it. Kubernetes will then spawn pods accordingly which will fail due to the “truncated” image not existing. I was not able to describe the job or its pods because SUC seems to constantly replace the job. Using the configmap, I was able to enable debug mode though (SYSTEM_UPGRADE_CONTROLLER_DEBUG: true) and it seems to then log the job spec as JSON. This reveals the truncated image name in spec.template.spec.containers[0].image: aa/bbbb/ccccccccccccccccc/dddddddddddddddddd/eeeeeeeeee/xxxxxx/m

Expected behavior I expect the spawned jobs to use the correct image name. In this example, this would be aa/bbbb/ccccccccccccccccc/dddddddddddddddddd/eeeeeeeeee/xxxxxx/master:981a1f90cfcad63ea864957f996cde9b2e99e587.

Actual behavior The jobs use an image named truncated at 64 characters: aa/bbbb/ccccccccccccccccc/dddddddddddddddddd/eeeeeeeeee/xxxxxx/m.

Additional context None - but I’m happy to provide more information if required.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 15 (6 by maintainers)

Most upvoted comments

brandond@dev01:~/suc-test$ skopeo sync --src docker --dest docker docker.io/library/busybox:1.35.0-uclibc local.registry.k3s.khaus/$(openssl rand -hex 16)/$(openssl rand -hex 16)/$(openssl rand -hex 16)
INFO[0000] Tag presence check                            imagename="docker.io/library/busybox:1.35.0-uclibc" tagged=true
INFO[0000] Copying image ref 1/1                         from="docker://busybox:1.35.0-uclibc" to="docker://local.registry.k3s.khaus/7f19c02113f03500a5c8d72ce1a44965/d45508d8fdaa912cb352a97d14b7af1d/483e6eb76483325ea686ace1485b4b31/busybox:1.35.0-uclibc"

Seems to be stuck in a loop and the jobs never get created.

time="2022-07-13T22:00:33Z" level=debug msg="PLAN STATUS HANDLER: plan=system-upgrade/busybox-test@658, status={Conditions:[] LatestVersion: LatestHash: Applying:[]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:00:33Z" level=debug msg="PLAN GENERATING HANDLER: plan=system-upgrade/busybox-test@658, status={Conditions:[{Type:LatestResolved Status:True LastUpdateTime:2022-07-13T22:00:33Z LastTransitionTime: Reason:Version Message:}] LatestVersion:1.35.0-uclibc LatestHash:7dfd8044aa7c456209bd9a936377d75caf7b42dde345cfd6535981d1 Applying:[]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:00:33Z" level=debug msg="DesiredSet - Created batch/v1, Kind=Job system-upgrade/apply-busybox-test-on-k3s-server-1-with-7dfd8044aa7c45620-3655e for system-upgrade-controller system-upgrade/busybox-test" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:00:33Z" level=error msg="error syncing 'system-upgrade/busybox-test': handler system-upgrade-controller: plans.upgrade.cattle.io \"busybox-test\" not found, handler system-upgrade-controller: plans.upgrade.cattle.io \"busybox-test\" not found, requeuing" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:00:33Z" level=error msg="error syncing 'system-upgrade/apply-busybox-test-on-k3s-server-1-with-7dfd8044aa7c45620-3655e': handler system-upgrade-controller: jobs.batch \"apply-busybox-test-on-k3s-server-1-with-7dfd8044aa7c45620-3655e\" not found, requeuing" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:01:03Z" level=debug msg="PLAN STATUS HANDLER: plan=system-upgrade/busybox-test@658, status={Conditions:[] LatestVersion: LatestHash: Applying:[]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:01:03Z" level=debug msg="PLAN GENERATING HANDLER: plan=system-upgrade/busybox-test@658, status={Conditions:[{Type:LatestResolved Status:True LastUpdateTime:2022-07-13T22:01:03Z LastTransitionTime: Reason:Version Message:}] LatestVersion:1.35.0-uclibc LatestHash:7dfd8044aa7c456209bd9a936377d75caf7b42dde345cfd6535981d1 Applying:[]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:01:03Z" level=debug msg="DesiredSet - Created batch/v1, Kind=Job system-upgrade/apply-busybox-test-on-k3s-server-1-with-7dfd8044aa7c45620-3655e for system-upgrade-controller system-upgrade/busybox-test" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:01:03Z" level=error msg="error syncing 'system-upgrade/busybox-test': handler system-upgrade-controller: plans.upgrade.cattle.io \"busybox-test\" not found, handler system-upgrade-controller: plans.upgrade.cattle.io \"busybox-test\" not found, requeuing" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:01:33Z" level=debug msg="PLAN STATUS HANDLER: plan=system-upgrade/busybox-test@658, status={Conditions:[] LatestVersion: LatestHash: Applying:[]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:01:33Z" level=debug msg="PLAN GENERATING HANDLER: plan=system-upgrade/busybox-test@658, status={Conditions:[{Type:LatestResolved Status:True LastUpdateTime:2022-07-13T22:01:33Z LastTransitionTime: Reason:Version Message:}] LatestVersion:1.35.0-uclibc LatestHash:7dfd8044aa7c456209bd9a936377d75caf7b42dde345cfd6535981d1 Applying:[]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:01:34Z" level=debug msg="DesiredSet - Created batch/v1, Kind=Job system-upgrade/apply-busybox-test-on-k3s-server-1-with-7dfd8044aa7c45620-3655e for system-upgrade-controller system-upgrade/busybox-test" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"
time="2022-07-13T22:01:34Z" level=error msg="error syncing 'system-upgrade/busybox-test': handler system-upgrade-controller: plans.upgrade.cattle.io \"busybox-test\" not found, handler system-upgrade-controller: plans.upgrade.cattle.io \"busybox-test\" not found, requeuing" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/root/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:314"

The not found errors for both the plan and job resources seems to indicate that some of the informer caches aren’t getting started properly? If I delete the SUC pod, the next controller pod that comes up creates the job successfully:

brandond@dev01:~/suc-test$ kubectl describe job -n system-upgrade -l upgrade.cattle.io/plan=busybox-test
Name:                     apply-busybox-test-on-k3s-server-1-with-7dfd8044aa7c45620-3655e
Namespace:                system-upgrade
Selector:                 controller-uid=267c240d-ccdb-4901-9c99-927fc4bec3cb
Labels:                   objectset.rio.cattle.io/hash=6fb02a1a07227c1f395e337028a47ca6942caaa2
                          plan.upgrade.cattle.io/busybox-test=7dfd8044aa7c456209bd9a936377d75caf7b42dde345cfd6535981d1
                          upgrade.cattle.io/controller=system-upgrade-controller
                          upgrade.cattle.io/node=k3s-server-1
                          upgrade.cattle.io/plan=busybox-test
                          upgrade.cattle.io/version=1.35.0-uclibc
Annotations:              batch.kubernetes.io/job-tracking:
                          objectset.rio.cattle.io/applied:
                            H4sIAAAAAAAA/+xWW2/bOBP9KwWfJUe+yI4NfA/+YndrtHGMOu2iKIJgRI5srilSS46cGIb/+4KS4sjNpWn3dWHAEC8zPDxzZoZ7liGBAAI22jPQ2hCQNNr5oUn+Qk4OqWWlaXEgUt...
                          objectset.rio.cattle.io/id: system-upgrade-controller
                          objectset.rio.cattle.io/owner-gvk: upgrade.cattle.io/v1, Kind=Plan
                          objectset.rio.cattle.io/owner-name: busybox-test
                          objectset.rio.cattle.io/owner-namespace: system-upgrade
                          upgrade.cattle.io/ttl-seconds-after-finished: 900
Parallelism:              1
Completions:              1
Completion Mode:          NonIndexed
Start Time:               Wed, 13 Jul 2022 15:04:21 -0700
Completed At:             Wed, 13 Jul 2022 15:05:25 -0700
Duration:                 64s
Active Deadline Seconds:  900s
Pods Statuses:            0 Running / 1 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=267c240d-ccdb-4901-9c99-927fc4bec3cb
           job-name=apply-busybox-test-on-k3s-server-1-with-7dfd8044aa7c45620-3655e
           plan.upgrade.cattle.io/busybox-test=7dfd8044aa7c456209bd9a936377d75caf7b42dde345cfd6535981d1
           upgrade.cattle.io/controller=system-upgrade-controller
           upgrade.cattle.io/node=k3s-server-1
           upgrade.cattle.io/plan=busybox-test
           upgrade.cattle.io/version=1.35.0-uclibc
  Containers:
   upgrade:
    Image:      local.registry.k3s.khaus/7f19c02113f03500a5c8d72ce1a44965/d45508d8fdaa912cb352a97d14b7af1d/483e6eb76483325ea686ace1485b4b31/busybox:1.35.0-uclibc
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
    Args:
      -c
      sleep 60
    Environment:
      SYSTEM_UPGRADE_NODE_NAME:             (v1:spec.nodeName)
      SYSTEM_UPGRADE_POD_NAME:              (v1:metadata.name)
      SYSTEM_UPGRADE_POD_UID:               (v1:metadata.uid)
      SYSTEM_UPGRADE_PLAN_NAME:            busybox-test
      SYSTEM_UPGRADE_PLAN_LATEST_HASH:     7dfd8044aa7c456209bd9a936377d75caf7b42dde345cfd6535981d1
      SYSTEM_UPGRADE_PLAN_LATEST_VERSION:  1.35.0-uclibc
    Mounts:
      /host from host-root (rw)
      /run/system-upgrade/pod from pod-info (ro)
  Volumes:
   host-root:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:  Directory
   pod-info:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels -> labels
      metadata.annotations -> annotations
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  107s  job-controller  Created pod: apply-busybox-test-on-k3s-server-1-with-7dfd8044aa7c45620-829jw
  Normal  Completed         43s   job-controller  Job completed