test-infra: Boskos server fail to claim pv after restart/reschedule from the node

/area boskos /assign

Events:
  FirstSeen	LastSeen	Count	From						SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----						-------------	--------	------		-------
  26m		26m		1	default-scheduler						Normal		Scheduled	Successfully assigned boskos-3859550540-jlgcx to gke-prow-default-pool-42819f20-28z7
  24m		23s		12	kubelet, gke-prow-default-pool-42819f20-28z7			Warning		FailedMount	Unable to mount volumes for pod "boskos-3859550540-jlgcx_test-pods(a9392393-9983-11e7-ad2e-42010a8000c4)": timeout expired waiting for volumes to attach/mount for pod "test-pods"/"boskos-3859550540-jlgcx". list of unattached/unmounted volumes=[boskos-volume]
  24m		23s		12	kubelet, gke-prow-default-pool-42819f20-28z7			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "test-pods"/"boskos-3859550540-jlgcx". list of unattached/unmounted volumes=[boskos-volume]

redeploy fixes it, but still want to fix the actual issue.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 22 (22 by maintainers)

Most upvoted comments

yeah I’ll close the issue, but we can keep the discussion going 😃

Attach/detach controller does react to pod deletion events: if the pod’s deletionTimeStamp is set and containers are terminated (see https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/volumehelper/volumehelper.go#L103) the attach/detach controller will trigger the detach operation.

However it does not trigger detach if the pod state is “Unknown” because we don’t want to detach and potentially corrupt user data, if pod state is unknown. We depend on some outside entity (node repair tool?) stepping in and deleting these pods. Question is if @krzyzacy has node-auto-repair on why didn’t it delete the pods?