jx: The jenkins-x-gc-activities pods start to fail and bring down Jenkins

Summary

Jenkins stops responding to requests web requests (EG: jx console page fails to load). Upon inspection of the pods, you end up with hundreds of failed jenkins-x-gc-activities pods:

❯ kubectl get pods --all-namespaces
NAMESPACE      NAME                                                   READY     STATUS              RESTARTS   AGE
jxmt           jenkins-x-gc-activities-1537207200-229pb               0/1       Error               0          3m
jxmt           jenkins-x-gc-activities-1537207200-2484n               0/1       Error               0          39m
jxmt           jenkins-x-gc-activities-1537207200-24fs6               0/1       Error               0          9m
jxmt           jenkins-x-gc-activities-1537207200-262j9               0/1       Error               0          31m
jxmt           jenkins-x-gc-activities-1537207200-268d6               0/1       Error               0          12m

And logs on one of those pods gets you:

❯ kubectl logs --tail=100 jenkins-x-gc-activities-1537207200-zwb8m -n jxmt
error: deployments.apps "prow-build" is forbidden: User "system:serviceaccount:jxmt:jenkins-x-gc-activities" cannot get deployments.apps in the namespace "jxmt"

When I installed JX, I did not enable prow. The install was basically jx create cluster eks and hooking it up to Bitbucket Cloud. I built the cluster this morning and Jenkins is unresponsive. This has actually happened to 2-3 of my clusters.

Steps to reproduce the behavior

  1. Create a cluster in EKS with default environments.
  2. Create a quickstart (golang-http for example). Pipeline passes/fails (doesn’t matter).
  3. Let it run for a few hours and the failing pods start to build up.

All of my repositories are private if that matters at all (--git-private flag).

Jx version

The output of jx version is:

NAME               VERSION
jx                 1.3.275
jenkins x platform 0.0.2447
kubernetes cluster v1.10.3
kubectl            v1.11.3
helm client        v2.10.0+g9ad53aa
helm server        v2.10.0+g9ad53aa
git                git version 2.19.0

Kubernetes cluster

EKS and created with jx create cluster eks but pointing to Bitbucket Cloud.

Operating system / Environment

N/A

Expected behavior

The jenkins-x-gc-activities pods to pass.

Actual behavior

The jenkins-x-gc-activities pods fail and end up bringing down Jenkins site.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 21 (5 by maintainers)

Most upvoted comments

fwiw: in the meantime, kubectl delete pod -l app=gc-activities will clear up all the pods

We started with the chart in jenkins-x-platform but are moving to using the charts in the jx repo. So ATM there’s some duplication annoyingly. Will try and sort out soon but for now I think this PR should do it https://github.com/jenkins-x/jenkins-x-platform/pull/3675 . I’ve just merged it so will run the end to end tests and release cloud-environments provided all is green.

No, you have to add those values as I did. 😃 My PR will do the job, I’ve just tested it against my cluster.

Kudos for @polothy for fixing this!

https://github.com/jenkins-x/jx/pull/1723 is now merged and is in jx 1.3.291 onwards. To pick that up you need to bump CHART_VERSION in your ~/.jx/cloud-environments/Makefile. 0.0.2510 is the latest. Make sure when you create the cluster you don’t select the option to recreate the cloud environment otherwise you’ll lose this change.