kubernetes: Job controller keeps logging panics

What happened?

Job controller, and possibly other controllers keep logging panics from this line in FilterActivePods: https://github.com/kubernetes/kubernetes/blob/349b85650530da2b4091dd1977f9344ff4f83201/pkg/controller/controller_utils.go#L955.

This happens during e2e tests, and I think it happens on production as well.

Here is an example for the successful build: https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/121302/pull-kubernetes-e2e-kind/1715287183352401920/artifacts/kind-control-plane/containers/kube-controller-manager-kind-control-plane_kube-system_kube-controller-manager-ec8b5cc2095bc6b1bdbfe61f132b3d493dea09ab0808935b59e10dcc5ffe1082.log 2023-10-20T09:10:28.016158573Z stderr F I1020 09:10:28.016037 1 controller_utils.go:955] "Ignoring inactive pod" pod="ttlafterfinished-3394/rand-non-local-ghcvx" phase="Failed" deletionTime="<panic: runtime error: invalid memory address or nil pointer dereference>"

What did you expect to happen?

No panics during e2e test from this line in FilterActivePods.

How can we reproduce it (as minimally and precisely as possible)?

Run e2e or integration tests for the job controller.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

Reproducible on kind during e2e tests

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Comments: 22 (22 by maintainers)

Most upvoted comments

When logging an object’s DeletionTimestamp, “is nil” is the right information to log when the pointer is nil. There is no default that can or should be used instead.

I can work of this 😊 /assign