argo-rollouts: Failed job results in successful analysis run

I’m using the Job metric provider for pre-promotion validation in a b/g scenario. The job results in failure (expected) but the analysis run still reports Successful. I expect the analysis run to also fail and cause the revision to be ineligible for promotion (automated or manual) unless otherwise ignored. If I set autoPromotionEnabled to true on my Rollout, the revision with the failed Job will be promoted automatically.

Rollout Status

$ kubectl argo rollouts -n example get rollout myapp
Name:            myapp
Namespace:       example
Status:          ॥ Paused
Strategy:        BlueGreen
Images:          registry.company.io/myapp:1.0.0 (active, preview)
Replicas:
  Desired:       1
  Current:       2
  Updated:       1
  Ready:         2
  Available:     1

NAME                                                       KIND         STATUS        AGE    INFO
⟳ myapp                                                    Rollout      ॥ Paused      41h
├──# revision:32
│  ├──⧉ myapp-64fc844b69                                   ReplicaSet   ✔ Healthy     113s   preview
│  │  └──□ myapp-64fc844b69-ptx8v                          Pod          ✔ Running     113s   ready:1/1
│  └──α myapp-64fc844b69-32                                AnalysisRun  ✔ Successful  48s    ✖ 1
│     └──⊞ e2f2718a-e8d4-4a7c-8867-7d8e17e6b01d.smoketest.1  Job          ✖ Failed      48s
├──# revision:31
│  ├──⧉ myapp-56c5c9749d                                   ReplicaSet   ✔ Healthy     4m35s  active
│  │  └──□ myapp-56c5c9749d-5qnjr                          Pod          ✔ Running     4m35s  ready:1/1
│  ├──α myapp-56c5c9749d-31.1                              AnalysisRun  ✔ Successful  3m36s  ✔ 1
│  │  └──⊞ 739b3158-dc61-4131-a09a-2b0f09a074a2.smoketest.1  Job          ✔ Successful  3m36s
$ kubectl -n example get pods
NAME                                                   READY   STATUS    RESTARTS   AGE
e2f2718a-e8d4-4a7c-8867-7d8e17e6b01d.smoketest.1-f4n9k   0/1     Error     0          2m20s
myapp-56c5c9749d-5qnjr                                 1/1     Running   0          6m7s
myapp-64fc844b69-ptx8v                                 1/1     Running   0          3m25s
$ kubectl -n example get jobs
NAME                                             COMPLETIONS   DURATION   AGE
e2f2718a-e8d4-4a7c-8867-7d8e17e6b01d.smoketest.1   0/1           2m29s      2m29s
$ kubectl -n example describe job e2f2718a-e8d4-4a7c-8867-7d8e17e6b01d.smoketest.1
Name:           e2f2718a-e8d4-4a7c-8867-7d8e17e6b01d.smoketest.1
Namespace:      example
Selector:       controller-uid=fc389554-01a1-4fee-84e8-76777e857e14
Labels:         analysisrun.argoproj.io/uid=e2f2718a-e8d4-4a7c-8867-7d8e17e6b01d
Annotations:    analysisrun.argoproj.io/metric-name: smoketest
                analysisrun.argoproj.io/name: myapp-64fc844b69-32
Controlled By:  AnalysisRun/myapp-64fc844b69-32
Parallelism:    1
Completions:    1
Start Time:     Thu, 28 May 2020 12:05:22 -0700
Pods Statuses:  0 Running / 0 Succeeded / 1 Failed
Pod Template:
  Labels:  controller-uid=fc389554-01a1-4fee-84e8-76777e857e14
           job-name=e2f2718a-e8d4-4a7c-8867-7d8e17e6b01d.smoketest.1

Analysis Template

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: smoketest
spec:
  args:
  - name: service-url
  metrics:
  - name: smoketest
    failureLimit: 1
    provider:
      job:
        spec:
          backoffLimit: 0
          template:
            spec:
              containers:
              - name: smoketest
                image: smoketest:image
                args:
                  - "{{ args.service-url }}"

Rollout

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  annotations:
    rollout.argoproj.io/revision: "32"
  name: myapp
  namespace: example
  resourceVersion: "3948767"
spec:
  progressDeadlineSeconds: 300
  replicas: 1
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: myapp
  strategy:
    blueGreen:
      activeService: myapp
      autoPromotionEnabled: false
      prePromotionAnalysis:
        args:
        - name: service-url
          value: http://myapp-preview.example.svc.cluster.local:8080
        templates:
        - templateName: smoketest
      previewService: myapp-preview
  template:
    metadata:
      labels:
        app.kubernetes.io/name: myapp
        app.kubernetes.io/version: 1.0.0
    spec:
      containers: [...]
      restartPolicy: Always
      terminationGracePeriodSeconds: 160

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 19 (6 by maintainers)

Most upvoted comments

ok, I figured it out. Seemed like I used a wrong image to execute my job. I was using curlimages/curl:latest at first and then replacing that image with a different one from my personal library worked. Anyway, thanks a lot for your help dude !

EDIT: Actually I was wrong… The problem was not the image but the options I put into my job. Adding these options:

          ttlSecondsAfterFinished: 1000
          activeDeadlineSeconds: 120

caused my strange behaviour of failed job with successful AnalysisRun. So maybe a bug here

RE EDIT: ok sorry for saying bullshit. I finally understood the true reason. my count was equal to 1 and my failureLimit also equal to 1. You need count > failureLimit to make it work. Anyway…It’s late Im tired and I should have had gone to bed instead of saying non sense. Maybe it will help someone 😃 Good night