kyma: Failing Pipeline: kyma-upgrade-gardener-kyma2-minor-versions - In-Cluster Event not sent

Description The pipeline kyma-upgrade-gardener-kyma2-minor-versions is failing due to the following issue:

  1) Upgrade test tests
       in-cluster event should be delivered (legacy events, structured and binary cloud events):
     Error: Request failed with status code 503
      at createError (node_modules/axios/lib/core/createError.js:16:15)
      at settle (node_modules/axios/lib/core/settle.js:17:12)
      at IncomingMessage.handleStreamEnd (node_modules/axios/lib/adapters/http.js:260:11)
      at endReadableNT (internal/streams/readable.js:1333:12)
      at processTicksAndRejections (internal/process/task_queues.js:82:21)

Expected result

In-Cluster event should be sent.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 23 (22 by maintainers)

Most upvoted comments

Moving to blocked because the PR is not merged to the release 2.9 branch, but since it is merged to main, it will be available with the upcoming Kyma release 2.10.X.

@VladislavPaskar There is no point in having a pipeline that will run red, it will just burn up the infra money. I would just ask you to enable/reintroduce the pipeline when it has the potential to turn green, which (if I am correct) would be at some point the next release is out.

@Sawthis but that is exactly what this pipeline has been testing - Deploy Kyma 2.9.1 and then upgrade 2.9.2 (second latest to latest).

This pipeline was then failing because of a bug in eventing, issue was created and assigned to Tunas. Meanwhile, neighbours decided to delete the pipeline because it seemed for them nobody takes care. Tunas as well as Jellyfish did not approve this removal. Thus, the pipeline got lost since we did not get any alerts any more.

The bug got fixed by this PR: https://github.com/kyma-project/kyma/pull/16278 Thus, the fix is not on main and will be released with 2.9.3.

What I want? @VladislavPaskar the Tunas should test the following scenarios to make sure it is indeed working as @Sawthis mentioned.

Please test:

  • deploy 2.9.1 upgrade to 2.9.2
  • deploy 2.9.2 upgrade to main

If those two scenarios are working, we are good. But at least what the pipeline tells us, is that the upgrade test for eventing was not working

Upgrade test tests
       in-cluster event should be delivered (legacy events, structured and binary cloud events):
     Error: Request failed with status code 503

And the pipeline needs to be reintroduced asap.

Closing this issue in favour of this one

@VladislavPaskar There is no point in having a pipeline that will run red, it will just burn up the infra money. I would just ask you to enable/reintroduce the pipeline when it has the potential to turn green, which (if I am correct) would be at some point the next release is out.

I see, that makes sense. We will re-enable it after the release.

@ruanxin @VladislavPaskar Yes, this pipeline tests from second latest released Kyma to latest release. Thus, if we bring it back now it will fail until next release, which is planed for 28th of December. Thus, it will fail for over two weeks and thus, be removed again by neighbours.

@Sawthis what do you think? Reintroduce them now, but it will fail until end of December. Or Tunas need to make sure to reintroduce it at same time as release is happening.

@VladislavPaskar one more question, since we currently have a breaking bug in eventing when a customer tries to upgrade its Kyma cluster. Don’t you think we should release earlier, since every customer will encounter this situation when 2.9.2 goes live. @Sawthis you are managing the 2.9.x releases (afaik) - Would also be good to get your opinion here.

(cc @zhoujing2022)

I am not sure which bug do you mean. If you mean the one, which you encountered in the pipeline, then it more about the wrong sequence of the tests execution. I tested the upgrade manually and there was not such problem.

Hi @Sawthis, thanks for the feedback. tbh, I don’t share the same opinion here, we reported this issue on Nov 10, it take times for us to analysis and delegate to responsible team to fix this issue, during this duration, the pipeline will continue failing unfortunately, but that’s not indicate the pipeline has no value, it tells us this problem is not fixed yet, and remind us to keep tracking on it, that’s the value. If possible, a better solution could be stop this periodical job temporary until bug fixed, but not remove it.