pipeline: task time exceeded `timeouts.tasks` when task retried

Expected Behavior

If the timeouts.tasks time is exceeded, the task will not be retried.

finally should always be executed.

Actual Behavior

If the task is retried, the task time is exceeded timeouts.tasks.

In addition, if this causes the pipeline execution time to exceed timeouts.pipeline, finally is force timeouted.

Steps to Reproduce the Problem

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: please-say-bye
spec:
  tasks:
    - name: hi
      retries: 2
      taskSpec:
        steps:
          - name: hi
            image: alpine:3.12
            script: |
              echo 'hi'
              sleep 10
  finally:
    - name: bye
      taskSpec:
        steps:
          - name: bye
            image: alpine:3.12
            script: |
              echo 'bye'
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  generateName: please-say-bye-
spec:
  timeouts:
    pipeline: 10s
    tasks: 5s
    finally: 5s
  pipelineRef:
    name: please-say-bye
❯ tkn -n pipelines tr ls --label tekton.dev/pipeline=please-say-bye

NAME                             STARTED          DURATION    STATUS
please-say-bye-tzjdt-bye-pft4x   25 seconds ago   1 second    Failed(TaskRunTimeout)
please-say-bye-tzjdt-hi-qqlgs    30 seconds ago   5 seconds   Failed(TaskRunTimeout)

❯ tkn -n pipelines pr desc please-say-bye-tzjdt
Name:              please-say-bye-tzjdt
Namespace:         pipelines
Pipeline Ref:      please-say-bye
Service Account:   default
Labels:
 tekton.dev/pipeline=please-say-bye

🌡️  Status

STARTED        DURATION     STATUS
1 minute ago   14 seconds   Failed

💌 Message

Tasks Completed: 2 (Failed: 2, Cancelled 0), Skipped: 0 (TaskRun "please-say-bye-tzjdt-bye-pft4x" failed to finish within "1s")

📦 Resources

 No resources

⚓ Params

 No params

📝 Results

 No results

📂 Workspaces

 No workspaces

🗂  Taskruns

 NAME                               TASK NAME   STARTED        DURATION    STATUS
 ∙ please-say-bye-tzjdt-bye-pft4x   bye         1 minute ago   1 second    Failed(TaskRunTimeout)
 ∙ please-say-bye-tzjdt-hi-qqlgs    hi          1 minute ago   5 seconds   Failed(TaskRunTimeout)

⏭️  Skipped Tasks

 No Skipped Tasks
スクリーンショット 2021-07-02 17 59 45

Additional Info

  • Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.4-eks-6b7464", GitCommit:"6b746440c04cb81db4426842b4ae65c3f7035e53", GitTreeState:"clean", BuildDate:"2021-03-19T19:33:03Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:
Client version: 0.19.0
Pipeline version: v0.25.0
Triggers version: v0.14.0
Dashboard version: v0.17.0

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (11 by maintainers)

Commits related to this issue

Most upvoted comments

I’m returning to this issue to see if it has been resolved by #5134 (FYI @abayer).

@ornew, I’m curious why you say in your original comment that the finally task should not be timed out if timeouts.pipeline is exceeded. I think not running the finally task is intended behavior, as timeouts.pipeline should refer to the entire time the pipeline is running. I tried using your example posted in the original comment, and it results in the pipelinerun being timed out after 10 seconds and finally tasks not being run, which I believe is the correct behavior, as the pipelinerun should stop running after 10s.

If you’d like to allow the finally tasks to run indefinitely but have the tasks section time out after some time, I think you need to specify timeouts.tasks = 5s and timeouts.pipeline = 0 (no timeout). (Unfortunately this doesn’t work-- filed https://github.com/tektoncd/pipeline/issues/5459 but should be easily fixable.

There’s still a bug, though, with the retried taskrun not being timed out when timeouts.tasks is reached. Here’s a reproducer:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: please-say-bye-again
spec:
  tasks:
    - name: hi
      retries: 2
      taskSpec:
        steps:
          - name: hi
            image: alpine:3.12
            script: |
              echo 'hi'
              sleep 10
              exit 1
  finally:
    - name: bye
      taskSpec:
        steps:
          - name: bye
            image: alpine:3.12
            script: |
              echo 'bye'
---
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  generateName: please-say-bye-again-
spec:
  timeouts:
    pipeline: 1m
    tasks: 19s
  pipelineRef:
    name: please-say-bye-again

In this example, each attempt (there are 3) sleeps for 10s and fails, the finally task is run, and the pipelinerun fails. I would expect that the first attempt fails and retries, the taskrun is canceled before the second attempt completes, the finally task runs, and the pipelinerun fails.