openyurt: [BUG] kubectl exec failed with unable to upgrade connection after OpenYurt install

What happened:

kubectl exec (or kubectl port-forward / istionctl ps) fails with the following error. Only master control-plane node can reproduce this issue.

root@control-plane:~# kubectl exec --stdin --tty ubuntu22-deamonset-5q6rg -- date
error: unable to upgrade connection: fail to setup the tunnel: fail to setup TLS handshake through the Tunnel: write unix @->/tmp/interceptor-proxier.sock: write: broken pipe

What you expected to happen:

kubectl exec (or kubectl port-forward / istionctl ps) succeeds w/o any error.

How to reproduce it (as minimally and precisely as possible):

  1. Setup Kubernetes Cluster with flannel, only control-plane is necesvirtualizedsary.
  2. OpenYurt v1.0 manual setting
  3. execute kubectl exec for any container running on control-plane.

Anything else we need to know?:

Environment:

  • OpenYurt version: v1.0.0 (git clone with this tag v1.0.0)

  • Kubernetes version (use kubectl version): v1.22.13

  • OS (e.g: cat /etc/os-release):

root@ceci-control-plane:~# cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
  • Kernel (e.g. uname -a):
root@ceci-control-plane:~# uname -a
Linux ceci-control-plane 5.4.0-126-generic #142-Ubuntu SMP Fri Aug 26 12:12:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: N.A

others

  • This is 100% reproducible with vagrant virtualbox virtualized instance.
  • Using physical machine, we are unable to reproduce this issue.
  • Could be anything related underlying network interfaces? or options for kubeadm or kubelet?

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 38 (30 by maintainers)

Commits related to this issue

Most upvoted comments

@rambohe-ch can you take a look at openyurtio/openyurt.io#238?

@fujitatomoya Thank you for updating faq tutorial, and i have merged this pull request. by the way, i will update the chinese faq tutorial soon.

@rambohe-ch

i found that there is configuration in /etc/hosts on cloud master node InternalIP is 192.168.56.20.

root@ceci-control-plane:~# cat /etc/hosts
127.0.0.1	localhost

# The following lines are desirable for IPv6 capable hosts
::1	ip6-localhost	ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
ff02::3	ip6-allhosts

127.0.1.1	ceci-control-plane	ceci-control-plane   ### This is the problem
192.168.56.20 ceci-control-plane

with above configuration, openyurt sees that hostname ceci-control-plane is 127.0.1.1, and there is no such IP information in the cluster system. So it expects that is edge node via open-yurt-tunneling, and there will be no response since there is no openyurt-tunnel-agent on this cloud master. eventually pipeline cannot work.

if we comment out 127.0.1.1 ceci-control-plane ceci-control-plane it works.

@fujitatomoya Thanks for your feedback. I am glad that you have solved the problem of dns, and yurt-tunnel worked now.

btw: it looks like there are some unexpected configurations in kube-apiserver.yaml that lead to 127.0.0.1 ceci-control-plane ceci-control-plane setting in /etc/hosts file.

you can dive into the code for creating /etc/hosts file in here: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L330-L348 and find the root reason.

@rambohe-ch

i found that there is configuration in /etc/hosts on cloud master node InternalIP is 192.168.56.20.

root@ceci-control-plane:~# cat /etc/hosts
127.0.0.1	localhost

# The following lines are desirable for IPv6 capable hosts
::1	ip6-localhost	ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
ff02::3	ip6-allhosts

127.0.1.1	ceci-control-plane	ceci-control-plane   ### This is the problem
192.168.56.20 ceci-control-plane

with above configuration, openyurt sees that hostname ceci-control-plane is 127.0.1.1, and there is no such IP information in the cluster system. So it expects that is edge node via open-yurt-tunneling, and there will be no response since there is no openyurt-tunnel-agent on this cloud master. eventually pipeline cannot work.

if we comment out 127.0.1.1 ceci-control-plane ceci-control-plane it works.

@rambohe-ch yeah, thanks for the comments.

as far as i can see, there are some cases that that goes through tunnel or not unexpectedly. i believe that those unexpected problems are related to host system configuration especially networks. we are not trying to get to the root case.

@fujitatomoya ok, i can give some hints how to debug why traffic does not go through cloud node directly from kube-apiserver.

  1. nsenter into network and mount namespace of kube-apiserver pod
  2. check the contents of /etc/resolv.conf file for kube-apiserver, make sure the name server is the ClusterIP of yurt-tunnel-dns service
  3. ping {yurt-tunne-dns podIP}, make sure kube-apiserver can access dns pod.
  4. dig {hostname} to check dns resolution can work or not

I1018 02:16:42.451452 1 tracereq.go:134] start handling request POST https://192.168.56.20:10250/exec/default/ubuntu22-deamonset-vkksr/ubuntu22?command=date&input=1&output=1&tty=1, from 127.0.0.1:46062 to 192.168.56.20:10250 E1018 02:16:42.451807 1 tunnel.go:74] “currently no tunnels available” err=“No backend available”

@rambohe-ch

and another question, I check the log of yurt-tunel-server according the FAQ of https://openyurt.io/docs/faq#yurt-tunnel.

I1018 02:16:42.451452       1 tracereq.go:134] start handling request POST https://192.168.56.20:10250/exec/default/ubuntu22-deamonset-vkksr/ubuntu22?command=date&input=1&output=1&tty=1, from 127.0.0.1:46062 to 192.168.56.20:10250
E1018 02:16:42.451807       1 tunnel.go:74] "currently no tunnels available" err="No backend available"
E1018 02:16:42.451943       1 interceptor.go:136] fail to setup the tunnel: fail to setup TLS handshake through the Tunnel: write unix @->/tmp/interceptor-proxier.sock: write: broken pipe
I1018 02:16:42.451963       1 tracereq.go:138] stop handling request POST https://192.168.56.20:10250/exec/default/ubuntu22-deamonset-vkksr/ubuntu22?command=date&input=1&output=1&tty=1, request handling lasts 495.59µs

root@ceci-control-plane:/home/vagrant# kubectl get pod -A -o wide
NAMESPACE      NAME                                             READY   STATUS    RESTARTS      AGE     IP              NODE                 NOMINATED NODE   READINESS GATES
default        ubuntu22-deamonset-9rm4d                         1/2     Running   4 (35m ago)   4h37m   10.244.1.3      ceci-worker1         <none>           <none>
default        ubuntu22-deamonset-vkksr                         2/2     Running   4 (37m ago)   4h37m   10.244.0.14     ceci-control-plane   <none>           <none>

root@ceci-control-plane:/home/vagrant# kubectl get node -o wide
NAME                 STATUS   ROLES                  AGE     VERSION    INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
ceci-control-plane   Ready    control-plane,master   20h     v1.22.13   192.168.56.20   <none>        Ubuntu 20.04.5 LTS   5.4.0-128-generic   docker://19.3.11
ceci-worker1         Ready    <none>                 5h14m   v1.22.13   192.168.56.21   <none>        Ubuntu 20.04.5 LTS   5.4.0-128-generic   docker://19.3.11

We can see that pod ubuntu22-deamonset-vkksr on the cloud side, but exec command forward to yurt-tunnel-server with 127.0.0.1:46062 to 192.168.56.20:10250 (API server), according FAQ of https://openyurt.io/docs/faq#yurt-tunnel, it was not correct , is it?

could you provide how to debug this issue?

@fujitatomoya Thanks for your reply. i can give some short descriptions about kubectl exec troubleshooting.

  1. for pods on cloud nodes, kubectl exec command don’t need to go through yurt-tunnel. only pods on edge nodes, yurt-tunnel will be involved.
  2. kube-apiserver will resolve hostname by yurt-tunnel-dns pod that dns records come from kube-system/yurt-tunnel-nodes configmap which managed by yurt-tunnel-server. if hostname resolution failed(for example: hostname for cloud nodes are resolved to ClusterIP of yurt-tunnel service), the reason maybe as following:
    • kube-apiserver have used yurt-tunnel-dns pod or not
    • dns records in kube-system/yurt-tunnel-nodes configmap is correct or not
  3. then if kubectl exec for edge nodes failed to go through yurt-tunnel, you can check the following items:
    • check the log of yurt-tunnel-server whether received the kubectl exec request or not
    • yurt-tunnel-agent connected with yurt-tunnel-server or not, check logs of yurt-tunnel-agent.