kubernetes: Flaking Test: subpath failures in new-master-upgrade-cluster-new-parallel, other jobs
Which jobs are failing: gce-new-master-upgrade-cluster-new-parallel
Which test(s) are failing:
Varies, but all subpath failures, including:
- [sig-storage] CSI Volumes [Driver: csi-hostpath-v0] [Testpattern: Dynamic PV (default fs)] subPath should fail if subpath with backstepping is outside the volume [Slow]
- [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic PV (default fs)] subPath should fail if subpath directory is outside the volume [Slow]
- [sig-storage] CSI Volumes [Driver: csi-hostpath-v0] [Testpattern: Dynamic PV (default fs)] subPath should fail if subpath file is outside the volume [Slow]
- [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic PV (default fs)] subPath should fail if subpath file is outside the volume [Slow]
… and pretty much every other subpath test, but never all of them at once.
There’s also a few other storage tests failing, such as:
- [sig-storage] Volume expand [Slow] Verify if editing PVC allows resize
- [sig-storage] Detaching volumes should not work when mount is in progress
Since when has it been failing: 11/22
Reason for failure:
These flakes started around the time that #71314 merged, but doesn’t match up with the exact merge stamp, so it’s probably coincidental.
The subpath test failures seem to be mostly timeouts:
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/storage/testsuites/subpath.go:254
while waiting for failed event to occur
Expected error:
<*errors.errorString | 0xc0000d1860>: {
s: "timed out waiting for the condition",
}
timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/storage/testsuites/subpath.go:601
… so possibly this is just GCE fail.
Anything else we need to know:
This test job has always been flaky, with around a 40% failure rate.
/kind flake /sig storage /priority important-soon
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 31 (27 by maintainers)
#71570 and #71569 have been merged and will address the biggest issues. #71570 has been backported to 1.13 already, #71569 backport is still pending.
I saw the same on https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-new-master-upgrade-cluster-new-parallel/422
CSI volume mount succeeded at:
Kubelet correctly failed the pod because of subpath
But test still failed:
Because the test failed to find the pod event.
This looks like a test issue. Not a 1.13 blocker.
@saad-ali @msau42
These all 4 subpath tests call
testPodFailSubpathErrorand it fails in finding a specific failed event by usingWaitTimeoutForPodEvent. AsWaitTimeoutForPodEventuseseventOccuredto check if the specific error event happens, andeventOccuredonly checks first event, I guess that tests may flake if other failed event happens before the expected one.Also,
WaitTimeoutForPodEventalready wait for the event, we might be able to deleteWaitForPodRunningInNamespaceintestPodFailSubpathError.I will create a PR to fix above.