kubernetes: Controller manager NPE because some controllers aren't ready to receive shared informer events immediately after adding handlers

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

When I trying on kubernetes master HA, when the master node failured, the null point exceptions throws on candidate hosts and lead the controller manager crashed.

I0821 10:23:53.900036       5 controllermanager.go:460] Started "endpoint"
I0821 10:23:53.900047       5 controllermanager.go:450] Starting "serviceaccount"
I0821 10:23:53.900167       5 endpoints_controller.go:154] Starting endpoint controller
I0821 10:23:53.900184       5 controller_utils.go:1021] Waiting for caches to sync for endpoint controller
W0821 10:23:53.916153       5 client_builder.go:226] Token for service-account-controller/kube-system did not authenticate correctly
W0821 10:23:53.916172       5 client_builder.go:169] secret service-account-controller-token-c6rqp contained an invalid API token for service-account-controller/kube-system
I0821 10:23:53.929196       5 client_builder.go:233] Verified credential for service-account-controller/kube-system
I0821 10:23:53.929559       5 controllermanager.go:460] Started "serviceaccount"
I0821 10:23:53.929571       5 controllermanager.go:450] Starting "daemonset"
I0821 10:23:53.929683       5 serviceaccounts_controller.go:113] Starting service account controller
I0821 10:23:53.929706       5 controller_utils.go:1021] Waiting for caches to sync for service account controller
I0821 10:23:53.948499       5 client_builder.go:233] Verified credential for daemon-set-controller/kube-system
I0821 10:23:53.953902       5 daemon_controller.go:160] Adding daemon set calico-node-s390x
I0821 10:23:53.953996       5 controllermanager.go:460] Started "daemonset"
E0821 10:23:53.954001       5 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/root/hchen/GOPATH/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/root/hchen/GOPATH/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/root/hchen/GOPATH/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:514
/usr/local/go/src/runtime/panic.go:489
/usr/local/go/src/runtime/panic.go:63
/usr/local/go/src/runtime/signal_unix.go:290
/root/hchen/GOPATH/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/controller/daemon/daemon_controller.go:161
/root/hchen/GOPATH/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/controller.go:195
<autogenerated>:57
/root/hchen/GOPATH/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:545
/root/hchen/GOPATH/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:381
/root/hchen/GOPATH/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/usr/local/go/src/runtime/asm_amd64.s:2197
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x31c705a]

goroutine 1244 [running]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/root/hchen/GOPATH/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x126
panic(0x4050880, 0x934c510)
	/usr/local/go/src/runtime/panic.go:489 +0x2cf
k8s.io/kubernetes/pkg/controller/daemon.NewDaemonSetsController.func1(0x4809980, 0xc422100000)
	/root/hchen/GOPATH/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/controller/daemon/daemon_controller.go:161 +0xea
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(0xc422c963a0, 0xc422c963b0, 0xc422c963c0, 0x4809980, 0xc422100000)
	/root/hchen/GOPATH/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/controller.go:195 +0x49
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnAdd(0xc422c98a40, 0x4809980, 0xc422100000)
	<autogenerated>:57 +0x73
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc422cde230)
	/root/hchen/GOPATH/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:545 +0x287
k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*processorListener).(k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.run)-fm()
	/root/hchen/GOPATH/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/shared_informer.go:381 +0x2a
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc420415788, 0xc422c963f0)
	/root/hchen/GOPATH/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/root/hchen/GOPATH/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x62

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration**:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 28 (26 by maintainers)

Commits related to this issue

Most upvoted comments

Spoke with @ncdc about this on slack, and my current proposal is to make the SharedInformerFactory auto-start newly requested informers if Start has already been called. This gives the factory owner more control over the lifecycle of the informers shared by controllers, removing the need for the GC (or other) controllers to try and manage the factory lifecycle.

Also if it wasn’t clear in my comment above, I don’t think this has anything to do with failover/leader election. It may be possible that failover makes this more likely to happen, but I believe that’s correlation, not causation.