kubernetes: AzureDisk Mount causing Kubelet Crash

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"a16c0a7f71a6f93c7e0f222d961f4675cd97a46b", GitTreeState:"clean", BuildDate:"2016-09-26T18:16:57Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.5", GitCommit:"5a0a696437ad35c133c0c8493f7e9d22b0f9b81b", GitTreeState:"clean", BuildDate:"2016-10-29T01:32:42Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: azure (acs-engine)
  • OS (e.g. from /etc/os-release): Ubuntu 16.04 LTS
  • Kernel (e.g. uname -a): Linux k8s-master-5A10CE6E-0 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016x86_64 x86_64 x86_64 GNU/Linux

What happened:

Deploying Pachyderm onto Kubernetes cluster on Azure. Pachyderm requires a persistent volume for their RethinkDB and we’re seeing kubelet crash and eventual VM lock up.

  • Empty data disk is created, ext4 formatted, and uploaded to storage
  • The data-disk mount is successful
I1110 18:46:17.084038   17216 operation_executor.go:766] MountVolume.MountDevice succeeded for volume "kubernetes.io/azure-disk/pach-disk.vhd" (spec.Name: "rethink-volume") pod "03ad3308-a775-11e6-aa08-000d3a34f678" (UID: "03ad3308-a775-11e6-aa08-000d3a34f678") device mount path "/var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/pach-disk.vhd"
I1110 18:46:17.118751   17216 operation_executor.go:803] MountVolume.SetUp succeeded for volume "kubernetes.io/azure-disk/pach-disk.vhd" (spec.Name: "rethink-volume") pod "03ad3308-a775-11e6-aa08-000d3a34f678" (UID: "03ad3308-a775-11e6-aa08-000d3a34f678")
  • After about 4-5 minutes, kubelet crashes on the node
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: panic: runtime error: invalid memory address or nil pointer dereference
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: [signal 0xb code=0x1 addr=0x0 pc=0xe3a62c]
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: goroutine 283 [running]:
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: panic(0x44ce1a0, 0xc820012070)
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /usr/local/go/src/runtime/panic.go:481 +0x3e6
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: k8s.io/kubernetes/pkg/volume/azure_dd.(*azureDataDiskPlugin).newMounterInternal(0xc82066c060, 0xc821c86120, 0xc8218dca56, 0x24, 0x7ff0f97fc650, 0x762e900, 0x0, 0x0, 0x0, 0x0)
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/azure_dd/azure_dd.go:117 +0x61c
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: k8s.io/kubernetes/pkg/volume/azure_dd.(*azureDataDiskPlugin).NewMounter(0xc82066c060, 0xc821c86120, 0xc821840500, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/volume/azure_dd/azure_dd.go:106 +0xc3
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler.(*reconciler).reconstructVolume(0xc82088d8c0, 0xc8218dca56, 0x24, 0xc82187405c, 0xe, 0xc821874150, 0x6a, 0xc821b2b1e0, 0x18, 0x0, ...)
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go:517 +0x399
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler.(*reconciler).syncStates(0xc82088d8c0, 0xc820072e20, 0x15)
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go:430 +0x354
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler.(*reconciler).sync(0xc82088d8c0)
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go:384 +0x68
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler.(*reconciler).reconciliationLoopFunc.func1()
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go:147 +0xf4
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: k8s.io/kubernetes/pkg/util/wait.JitterUntil.func1(0xc821172120)
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:84 +0x19
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc821172120, 0x5f5e100, 0x0, 0xc821172101, 0xc820050fc0)
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:85 +0xb4
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: k8s.io/kubernetes/pkg/util/wait.Until(0xc821172120, 0x5f5e100, 0xc820050fc0)
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:47 +0x43
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler.(*reconciler).Run(0xc82088d8c0, 0x7ff0f97ff160, 0xc8201b32c0, 0xc820050fc0)
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go:133 +0x5b
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]: created by k8s.io/kubernetes/pkg/kubelet/volumemanager.(*volumeManager).Run
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 docker[12793]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/volumemanager/volume_manager.go:240 +0x160
Nov 10 18:45:28 k8s-agent-5A10CE6E-1 systemd[1]: kubelet.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
  • This repeats for about an hour until the VM locks up (e.g. unable to SSH). I needed to forcibly stop and start the VM from the portal (restarting did not work). Over that hour, the node’s memory usage steadily grows and the CPU spikes close to 100%
screen shot 2016-11-10 at 12 12 32

(Pachyderm deployed at 10:40am, VM locks up at 11:30ish am)

What you expected to happen:

Azure Disk mounted, no crash. Rainbows and unicorns.

How to reproduce it (as minimally and precisely as possible):

Anything else do we need to know:

cc @colemickens

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 21 (10 by maintainers)

Commits related to this issue

Most upvoted comments

I already have an issue open tracking the mounts appearing numerous times: https://github.com/kubernetes/kubernetes/issues/30258