spinnaker: Waiting executions doesn't follow FIFO

Issue Summary:

Waiting executions doesn’t follow FIFO

Cloud Provider(s):

Environment:

Spinnaker 1.21.4, Spinnaker 1.25

Feature Area:

Orca queue

Description:

The sequence of queued executions, which should follow FIFO, gets messed up when two conditions are met

  1. The pipeline has branches. When one branch completes, the running execution is waiting for other branches to complete.
  2. Pipeline disables concurrent executions. So several pending executions are waiting in a queue when pipeline has running executions.

Checked redis queue, the order of 01F0WQV9NR65J9F9N2FWN8FRVK is changed from oldest to newest. And the attempts attribute get +1

127.0.0.1:6379> LRANGE orca.pipeline.queue.8b4f6fbf-df5b-4ac9-9b0e-a63b3dd38c97 0 10
1) "{\"kind\":\"startExecution\",\"executionType\":\"PIPELINE\",\"executionId\":\"01F0WQVEY9XK52Y47W2DD1F91W\",\"application\":\"issuedebug\",\"attributes\":[{\"kind\":\"attempts\",\"attempts\":1}]}"
2) "{\"kind\":\"startExecution\",\"executionType\":\"PIPELINE\",\"executionId\":\"01F0WQVCB8EPRJAJNQ55VF3QTN\",\"application\":\"issuedebug\",\"attributes\":[{\"kind\":\"attempts\",\"attempts\":1}]}"
3) "{\"kind\":\"startExecution\",\"executionType\":\"PIPELINE\",\"executionId\":\"01F0WQV9NR65J9F9N2FWN8FRVK\",\"application\":\"issuedebug\",\"attributes\":[{\"kind\":\"attempts\",\"attempts\":1}]}"
127.0.0.1:6379> 
127.0.0.1:6379> 
127.0.0.1:6379> LRANGE orca.pipeline.queue.8b4f6fbf-df5b-4ac9-9b0e-a63b3dd38c97 0 10
1) "{\"kind\":\"startExecution\",\"executionType\":\"PIPELINE\",\"executionId\":\"01F0WQV9NR65J9F9N2FWN8FRVK\",\"application\":\"issuedebug\",\"attributes\":[{\"kind\":\"attempts\",\"attempts\":2}]}"
2) "{\"kind\":\"startExecution\",\"executionType\":\"PIPELINE\",\"executionId\":\"01F0WQVEY9XK52Y47W2DD1F91W\",\"application\":\"issuedebug\",\"attributes\":[{\"kind\":\"attempts\",\"attempts\":1}]}"
3) "{\"kind\":\"startExecution\",\"executionType\":\"PIPELINE\",\"executionId\":\"01F0WQVCB8EPRJAJNQ55VF3QTN\",\"application\":\"issuedebug\",\"attributes\":[{\"kind\":\"attempts\",\"attempts\":1}]}"

Checked redis message, looks like the “completeExecution” message of current running execution may be related to re-order. But we have no idea why it pops out the oldest waiting execution and pushes it back.

{"kind":"completeExecution","executionType":"PIPELINE","executionId":"01F0WQV6WFRXCS2JQ5C8X043WP","application":"issuedebug","attributes":[{"kind":"attempts","attempts":3}]}
{"kind":"runTask","executionType":"PIPELINE","executionId":"01F0WQV6WFRXCS2JQ5C8X043WP","application":"issuedebug","stageId":"01F0WQV6WTS33MK59K55FBEQ3S","taskId":"1","taskType":"com.netflix.spinnaker.orca.pipeline.tasks.WaitTask","attributes":[{"kind":"attempts","attempts":1}],"ackTimeoutMs":600000}

We met this issue since 2019 (previous issue https://github.com/spinnaker/spinnaker/issues/4587. ) Now it still exists in 1.21, even latest 1.25. We also tried mysql as queue, not working.

Steps to Reproduce:

  1. create one pipeline, with two wait stages as two branches. One wait stage1 set like 10 secs, the other wait stage2 set 300 secs
  2. disable concurrent execution of this pipeline
  3. run this pipeline 5 times, to make sure one running and 4 waiting. The 4 waiting executions should follow FIFO in redis queue at the very beginning (with attempt=1)
  4. When wait stage1 complete but waitStage2 is still running, the “kind”:“completeExecution” will be created in orca message.
  5. Now monitor the waiting 4 executions from both UI and redis queue, you can see the re-order.

Additional Details:


About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 31

Commits related to this issue

Most upvoted comments

I have created a PR to fix this issue, can someone help to review this and approve. https://github.com/spinnaker/orca/pull/4356

Hi Arjun, There are the two workarounds to bypass the issue.

  1. Flat the pipeline to keep single branch => avoid running pipeline checking some branches complete
  2. Enable concurrent pipeline => avoid pending execution listed in queue

Both require to redesign the pipeline. We are still waiting for the fix.