kubernetes: Auto-closing issues is harmful and causes friction

Previously open issue : https://github.com/kubernetes/community/issues/5473 Twitter Discussion : https://twitter.com/BenTheElder/status/1407774856033181696

There’s pros and cons to having a bot do this, let me try to enumerate current situation here:

  • maintainers don’t have to play bad cop
  • keep set of open issues at a maintainable level
  • bot commands are available to org members to /reopen reduces friction
  • lifecycle labels help folks specify which issues not to touch
  • we currently do not have dedicated team(s) that do issue triage well (except for some SIG like api-machinery that do this really well with twice a week triage session)
  • initial issue labeling is still a problem, routing issues to the correct sig
  • we have a new triage label which is not widely used yet across sigs

However there are numerous problems with the current situation:

  • impression that project does not care about bugs
  • folks open new issues for things that may be already present but closed
  • too much traffic/noise from the bot activity
  • issues where some follow up was requested/needed gets closed

Options to do better?

  • triage party
  • more hands on deck, train incoming folks to do issue triage and manage that as a subproject of some SIG
  • stop the bot cold and let the SIGs adjust

What did i miss @BenTheElder ?

/sig contributor-experience

PS: It will be ironic if this issue gets auto-closed!

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 39
  • Comments: 30 (26 by maintainers)

Most upvoted comments

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

If folks are reluctant to disable the bot entirely (Ben’s suggestion), I have an alternate proposal:

  1. Triaged issues are never marked stale or closed.
  2. (optional) After 1 year (or some long timeline) of no activity, require triaged issues to be re-triaged (this is sort of like marking them as stale)
  3. Regular stale/rotten/closed rules are applied to untriaged issues

It seems like a good idea to revisit the workflows here. What follows is very much my opinion and not representing any SIG. It also covers more than just autoclose.

Auto close does avoid issues piling up.

  • I’m OK with the idea of never closing issues that are triaged as accepted
  • I’m also OK with the idea of auto closing issues that have been marked as needs more information or as support, and haven’t had a response since they were so labelled.
  • I’d be interested in splitting the “backlog” priority level into “stuff we’d clearly like to do as a SIG, and would if there were resources” vs. “it’s a valid suggestion but it’ll only happen if someone takes it forward”. For an example of the latter see issue 25061 in k/website: Allow switching website between “light mode” and “dark mode”.
  • That split would help us let things fall off the second backlog, but perhaps let them stay on the first.
  • Priorities above backlog should not autoclose, but I do want to see them rot and then be labelled something if we don’t get to them. Maybe become “overdue” or “unaddressed”?

For k/website, issues that haven’t been triaged in n days could go straight to unaddressed, skipping stale and rotten.

So people don’t have to wade through Twitter, let me summarize my 2c…

Volume:

Contributor and issue filer expectations:

  • Allowing a backlog of issues that anyone can file to grow forever fails to set healthy boundaries and realistic expectations around support in a FOSS project. Closed, stale issues illustrates the staffing gap and at least shifts some of the emotional labor on closing stale issues (which no one enjoys doing!! particularly if the reply that comes back is rude!) to a bot.
  • I think there’s a difference between closing issues that have been confirmed at some point but we’re not staffed to handle vs. various user support requests, perhaps where the reporter is unresponsive.
  • I frequently struggle with confirmed, legitimate issues that the project doesn’t have resources to address, because there are only so many things we can prioritize working on. Even if an outside contributor does the work to fix the problem, reviewer/approver time is very limited.
  • Addressing the limited approver time gap is not easy. It is extremely challenging to develop the experience to apply for approver on some of the high volume areas of the codebase (API Machinery, Node, etc.) You need to work on the project full-time, and that requires funding/employment. Many areas of the project have a lottery factor of 1.

At the moment, I think it’s clear that SIGs are not resourced to manage the current volume of incoming issues in k/k. I think it’s also fair to argue that the stale bot has discouraged issue filing, so I suspect that if we staffed the project for current issue load, we could easily see it increase in a short time and would find ourselves understaffed again.

Could the project hire someone to do full-time issue triage and management? And, if we did this, would we have sufficient staffing in SIGs to manage the workload of legitimate bugs and feature requests after moving the bottleneck?

Being mostly on the user side I do feel a lot of frustration with these. I think auto-closing specifically in the case of a ticket/PR being in a “waiting for OP to respond” state is okay but when the issue is accepted as real and there’s nothing more to say, then auto-closing them is pretty grumble-inducing. Auto-closing untriaged tickets/PRs feels like a gray area, can see both sides, and probably should be a special case and a sign that our triage process needs improvement.

To throw out another suggestion specific to k/k, I run the weekly SIG Auth Issue triage meeting with @ritazh @aramase @nilekhc @ibihim @natalisucks and others. We are slowly getting through our issues, slowly improving the automation around the triage process, etc.

I find this bot to be harmful, abrasive and an incredible waste of the community’s time.

My proposal is very simple: if a k/k issue or PR has the sig/auth label and the total number of SIG labels is 3 or less, the bot should ignore it completely. I and the other SIG members will take complete ownership of our issues and their triage. I have no problem closing issues and PRs directly. The passage of time does not solve bugs.

this is like a 2min read: This provides some really useful insight into the problem: https://blog.benwinding.com/github-stale-bots/

The way I see it, issues should never be closed, and it doesn’t really matter if you have 100 open issues or 10,000.

As a maintainer, personally, I’d want to concentrate on

  1. confirmed bugs first (which would could be annotated as labels)
  2. unlabeled issues that come in, sort by 👍 reactions, as it shows someone else encountered the same issue, and is reporting that they too are seeing the same problem, or desire the same feature
  3. unlabeled issues that come in, sort by recent changes (new comments), this shows relevant activity in that thread that could mean high impact
  4. investigate the use of something like this in order to find duplicate issues, label, and close them as duplicates: https://github.com/probot/duplicate-issues
  5. utilize github actions to export information to a system that can store and query the data. E.g. on: issue_comment, etc.

Can confirm it was the Helm project (@spiffxp and I chatted about this at the first Helm Summit).

takes sip from helm summit water bottle that was a fun summit

If there was an outright denial I sure hope it didn’t come from me, though maybe I was the messenger. At the time I think the project was trying to tamp down the entropy of every repo having different rules of engagement. Onboard a contributor to the project, and they should be capable of contributing to any of our 200+ repos. The bot was a way of maintaining some kind of expectation that issues that haven’t been touched in 150 days are implicitly deemed as not urgent enough to work on.

I personally am totally fine with repos that wish to exclude themselves from this workflow, with the exception of kubernetes/kubernetes. That said I ultimately think this is a SIG Contributor Experience call.

Previously we refused at least one project’s request (helm, as I recall),

Can confirm it was the Helm project (@spiffxp and I chatted about this at the first Helm Summit).

What strategy should we use for the backlog of issues that no one has engaged with in half a decade?

We close bugs far quicker than half a decade. IMO the amount of time a bug spends open is actually immaterial. If the bug is real and not fixed it deserves to be tracked, it is wasteful to file N bugs for the same problem, and it prevents users from tracking resolution when we spread it out in this way.

We’ve yet to have any solid arguments as to why open bugs are harmful. At best there is some complaint about issue triage, but that is accomplishable with e.g. just labeling the issue and filtering by label. Closing is orthogonal.

Ideally we can figure out how to redirect the energy spent getting angry at automated descriptive processes and towards fixing more issues more consistently.

Many people filing bugs are ill equipped to fix them. A user of the project may not be in the right position / experience to resolve a bug. Their frustration is still relevant to the success of the project. There is also a lot of energy wasted in /remove-lifecycle comments.

The impression that the project does not care about bugs is an accurate conclusion implied by the large number of auto-closed issues. At the very least, the project’s ability to generate bugs is higher than the community’s current ability to fix them.

Not even leaving the bug tracked and dismissively closing it is a bit different than not getting to it.

In the same way smoke detectors rarely cause fires, I would not assign the cause of the reputational harm from ignoring bugs to automated processes that flag things as ignored.

There is explicitly evidence that there is reputational harm from this. Off hand: https://twitter.com/jordansissel/status/1407857648083472388 “K8s auto closing bot is why I don’t interact with k8s projects anymore. 😭👍🏻”

Further, I’d say this is not comparable to a smoke detector. Tracking the amount of inactive bugs is a smoke detector. Closing them actually obscures this, I’d have a different impression encountering a project with ~9k bugs open than < 2k. How many people are aware of searching github for the closed issues with special label?

Closing issues is more like having a robot dump the flaming objects in your backyard so the smoke detector stops going off.

Rather I would note the lack of the right processes in place that allow the maintainers on this project respond to incoming issues faster than they are supplied and also identify issues that have fallen through the cracks.

This is almost certainly the most important thing to address. That doesn’t mean that other aspects of the current approach aren’t problematic though.