test-infra: Boskos server fail to claim pv after restart/reschedule from the node
/area boskos /assign
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
26m 26m 1 default-scheduler Normal Scheduled Successfully assigned boskos-3859550540-jlgcx to gke-prow-default-pool-42819f20-28z7
24m 23s 12 kubelet, gke-prow-default-pool-42819f20-28z7 Warning FailedMount Unable to mount volumes for pod "boskos-3859550540-jlgcx_test-pods(a9392393-9983-11e7-ad2e-42010a8000c4)": timeout expired waiting for volumes to attach/mount for pod "test-pods"/"boskos-3859550540-jlgcx". list of unattached/unmounted volumes=[boskos-volume]
24m 23s 12 kubelet, gke-prow-default-pool-42819f20-28z7 Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "test-pods"/"boskos-3859550540-jlgcx". list of unattached/unmounted volumes=[boskos-volume]
redeploy fixes it, but still want to fix the actual issue.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 22 (22 by maintainers)
yeah I’ll close the issue, but we can keep the discussion going 😃
Attach/detach controller does react to pod deletion events: if the pod’s
deletionTimeStampis set and containers are terminated (see https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/volumehelper/volumehelper.go#L103) the attach/detach controller will trigger the detach operation.However it does not trigger detach if the pod state is “Unknown” because we don’t want to detach and potentially corrupt user data, if pod state is unknown. We depend on some outside entity (node repair tool?) stepping in and deleting these pods. Question is if @krzyzacy has node-auto-repair on why didn’t it delete the pods?