kubernetes: "kubectl exec" sometimes incorrectly returns empty string causing tests to flake
@jingxu97 noticed (https://github.com/kubernetes/kubernetes/issues/28081#issuecomment-251457653) that a lot of the Pod disks should...
PD E2E tests are very flaky on GCI because kubectl exec
is incorrectly returning an empty string.
Jing’s report:
Since last week we noticed that all 6 PD tests become flaky for ONLY GCI image. The failures are all similar, create a PD, write a string to a file (echo $string > $file) and read (cat $file) with “kubectl exec” command. Sometimes cat command returns empty string. Through investigation, we found out that the data is written to the file successfully, mounts are all fine. The only problem is that “kubectl exec” with cat command sometimes returns empty string in stdout. The following three tests prove this point.
Use the existing pd test, but instead of reading once, read the file in a loop for 100 times. Sometime, in the middle of these 100 reading, one read returns empty string and all the rest are fine.
Manually create a pod, use “kubectl” command with "echo > " to write some string to a file. And then run a bash script to create a loop to issue “kubectl” command with “cat <file>” many times. The cat returns the string. But after about 100 runs, cat returns empty string. And then run the loop again, it returns normally for about 300 times, then fails to return the string again. In the mean time, the data and mounts are untouched.
As Saad suggest, to make sure this is not caused by “cat” commdn, we create a pod. In pod spec’s container section, add a command inside of a container to write data to a file and then repeatedly read the file with ‘cat’ for 1000 times. Use “kubectl log <pod>” to check to output from the container and they are all fine.
Now the conclusion is that the issue is not related to storage side. “kubectl” command with “cat” command sometimes could not return the output correctly. Since this command execution involves several steps, passing the input command to container, executing the command, and returning the execution output from the container to the terminal, need to investigate more to find the root cause.
Please let me know if you have any question.
CC @pwittrock
Step two is a good way to repro this.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 28 (25 by maintainers)
Commits related to this issue
- Add retry loop around check for /etc/hosts contents to work around issue #34256 See https://github.com/kubernetes/kubernetes/issues/34256 — committed to bowei/kubernetes by bowei 8 years ago
- Merge pull request #34357 from bowei/flake-fix-27023 Automatic merge from submit-queue Add retry loop around check for /etc/hosts contents to work around issue #34256 See https://github.com/kuberne... — committed to kubernetes/kubernetes by deleted user 8 years ago
- Merge pull request #42846 from msau42/pd-flake Automatic merge from submit-queue Retry calls to ReadFileViaContainer in PD tests **What this PR does / why we need it**: kubectl exec occasionally f... — committed to kubernetes/kubernetes by deleted user 7 years ago
Docker 1.13 isn’t out yet. Unfortunately the best fix for this particular issue at this time is probably to ensure that all the PD tests that use
kubectl exec
retry it a few times, such as in #34357.