k3s: Increased failure rate on exec/attach discovered on csi e2e tests
Environmental Info: K3s Version:
- v1.23.7-rc1+k3s1 (in docker mode)
- v1.24.1-rc2+k3s1 (in default/containerd mode)
Node(s) CPU architecture, OS, and Version:
Linux k3s-master 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
Single server
Describe the bug:
We experience flaky tests in the ci of cinder-csi-plugin of cloud-provider-openstack, all related to the following error
un 2 13:10:56.391: INFO: ExecWithOptions {Command:[/bin/sh -c echo +mhlxaKrCV35dwsvcJbvbp3CFlaSAVZGRbNHMSHOYhvigtOOoprNIwi7vQbcbq58smAeLqT9MVTdwIAnzyOh3Q== | base64 -d | sha256sum] Namespace
:multivolume-8810 PodName:pod-71cf7d75-fdd8-48f7-b700-3eb09465429e ContainerName:write-pod Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false Quiet:false}
Jun 2 13:10:56.391: INFO: >>> kubeConfig: /root/.kube/config
Jun 2 13:10:56.391: INFO: ExecWithOptions: Clientset creation
Jun 2 13:10:56.392: INFO: ExecWithOptions: execute(POST https://172.24.5.182:6443/api/v1/namespaces/multivolume-8810/pods/pod-71cf7d75-fdd8-48f7-b700-3eb09465429e/exec?command=%2Fbin%2Fsh&com
mand=-c&command=echo+%2BmhlxaKrCV35dwsvcJbvbp3CFlaSAVZGRbNHMSHOYhvigtOOoprNIwi7vQbcbq58smAeLqT9MVTdwIAnzyOh3Q%3D%3D+%7C+base64+-d+%7C+sha256sum&container=write-pod&container=write-pod&stderr=t
rue&stdout=true)
Jun 2 13:10:56.405: FAIL: "echo +mhlxaKrCV35dwsvcJbvbp3CFlaSAVZGRbNHMSHOYhvigtOOoprNIwi7vQbcbq58smAeLqT9MVTdwIAnzyOh3Q== | base64 -d | sha256sum" should succeed, but failed with error message
"error dialing backend: EOF"
stdout:
stderr:
Unexpected error:
<*errors.StatusError | 0xc00257cd20>: {
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {
SelfLink: "",
ResourceVersion: "",
Continue: "",
RemainingItemCount: nil,
},
Status: "Failure",
Message: "error dialing backend: EOF",
Reason: "",
Details: nil,
Code: 500,
},
}
error dialing backend: EOF
occurred
Steps To Reproduce:
So far I didn’t find a way to reproduce without running e2e tests, but wanted to create this ticket to let you know about the problem. Maybe you are already aware 😃 If I can help testing, please let me know, I do have a relatively reliable way to reproduce the problem.
When I find a way to reproduce it properly, I will update this issue.
- Must run on OpenStack
- Installed K3s:
mkdir -p /var/lib/rancher/k3s/agent/images/
curl -sSL https://github.com/k3s-io/k3s/releases/download/v1.24.1-rc2+k3s1/k3s-airgap-images-amd64.tar -o /var/lib/rancher/k3s/agent/images/k3s-airgap-images.tar
curl -sSL https://github.com/k3s-io/k3s/releases/download/v1.24.1-rc2+k3s1/k3s -o /usr/local/bin/k3s
curl -sSL https://get.k3s.io -o /var/lib/rancher/k3s/install.sh
chmod u+x /var/lib/rancher/k3s/install.sh /usr/local/bin/k3s
INSTALL_K3S_SKIP_DOWNLOAD=true /var/lib/rancher/k3s/install.sh --disable traefik --disable metrics-server --disable servicelb --disable-cloud-controller --kubelet-arg="cloud-provider=external" --tls-san 172.24.5.182 --token 9b08jz.c0izixklcxymnze7
Deploy cloud-provider-openstack and and cinder-csi-plugin
Run cinder-csi e2e.
mkdir -p /var/log/csi-pod
/tmp/kubernetes/test/bin/e2e.test \
-storage.testdriver=/root/src/k8s.io/cloud-provider-openstack/tests/e2e/csi/cinder/test-driver.yaml \
-ginkgo.focus='External\s+Storage\s+\[Driver:\s+cinder.csi.openstack.org\]\s+\[Testpattern:\s+Dynamic\s+PV\s+\(ext4\)\]\s+multiVolume\s+\[Slow\]\s+should\s+access\s+to\s+two\s+volumes\s+with\s+the\s+same\s+volume\s+mode\s+and\s+retain\s+data\s+across\s+pod\s+recreation\s+on\s+the\s+same\s+node' \
-ginkgo.skip='\[Disruptive\]|\[Testpattern:\s+Dynamic\s+PV\s+\(default\s+fs\)\]\s+provisioning\s+should\s+mount\s+multiple\s+PV\s+pointing\s+to\s+the\s+same\s+storage\s+on\s+the\s+same\s+node|\[Testpattern:\s+Dynamic\s+PV\s+\(default\s+fs\)\]\s+provisioning\s+should\s+provision\s+storage\s+with\s+any\s+volume\s+data\s+source\s+\[Serial\]' \
-ginkgo.noColor \
-ginkgo.progress \
-ginkgo.v \
-test.timeout=0 \
-report-dir="/var/log/csi-pod" | tee "/var/log/csi-pod/cinder-csi-e2e.log"
Expected behavior:
No errors. On 1.23.6 we don’t experience any errors.
Actual behavior:
Happens frequently: See https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/openstack-cloud-csi-cinder-e2e-test/1532228188275478528 for example. All or most failed tests are related to the mentioned error
Additional context / logs:
journalctl -u k3s
Jun 02 13:10:56 k3s-master k3s[131992]: E0602 13:10:56.401545 131992 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"error dialing backend: EOF}: error dialing backend: EOF
Jun 02 13:10:56 k3s-master k3s[131992]: I0602 13:10:56.400513 131992 log.go:195] http: TLS handshake error from 127.0.0.1:35008: tls: first record does not look like a TLS handshake
Backporting
- Needs backporting to older releases
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 27 (23 by maintainers)
Commits related to this issue
- Workaround https://github.com/k3s-io/k3s/issues/5633. — committed to adelton/freeipa-container by adelton 2 years ago
- Add `--egress-selector-mode` flag while starting k3d Kanister CI tests were failing very frequently with below error while execing a command into a pod ``` error dialing backend: EOF ``` We are not... — committed to kanisterio/kanister by viveksinghggits 2 years ago
- kat: configure k3d to disable egress-selector I observed in the kat-client logs that it was getting `Error from server :error dialing backend: EOF` so after some googling I stumbled on to the followi... — committed to emissary-ingress/emissary by LanceEa a year ago
- kat: configure k3d to disable egress-selector I observed in the kat-client logs that it was getting `Error from server :error dialing backend: EOF` so after some googling I stumbled on to the followi... — committed to emissary-ingress/emissary by LanceEa a year ago
- kat: configure k3d to disable egress-selector I observed in the kat-client logs that it was getting `Error from server :error dialing backend: EOF` so after some googling I stumbled on to the followi... — committed to emissary-ingress/emissary by LanceEa a year ago
- ci: bump supported k8s versions to v1.22 - v1.27 We use a somewhat older version of v1.22 to avoid k3s-io/k3s#5633 Signed-off-by: Mike Beaumont <mjboamail@gmail.com> — committed to michaelbeaumont/kuma by michaelbeaumont a year ago
- ci: bump supported k8s versions to v1.22 - v1.27 We use a somewhat older version of v1.22 to avoid k3s-io/k3s#5633 Signed-off-by: Mike Beaumont <mjboamail@gmail.com> — committed to michaelbeaumont/kuma by michaelbeaumont a year ago
- ci(k8s): bump supported versions to v1.22 - v1.27 (#6365) We use a somewhat older version of v1.22 to avoid k3s-io/k3s#5633 Signed-off-by: Mike Beaumont <mjboamail@gmail.com> — committed to kumahq/kuma by michaelbeaumont a year ago
@consideRatio
curl -sfL https://get.k3s.io | sh -s - --disable traefik --disable servicelb --write-kubeconfig-mode 644 --egress-selector-mode=disabledMore specifically, the logs from the test runs provided above are runs in github actions from this pr branch: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/2798
The k3s setup is done via the action https://github.com/jupyterhub/action-k3s-helm that has a few steps described in the action.yaml file.
I’m writing from mobile atm, got to go!