istio: isito-validation initContainer report error when istio-cni is not installed correctly

Bug description When enabling istio-cni-repair on Istio 1.4.5, the istio-validation init container fails. Relevant logs:

in new validator: <pod_ip>
Listening on 127.0.0.1:15001
Listening on 127.0.0.1:15006
Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused
Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused
Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused
Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused
Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused
Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused
Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused

A brief examination of the code would indicate to me that the validator never starts a listener on the IptablesProbePort, unless I am mistake (https://github.com/istio/istio/blob/1.4.5/tools/istio-iptables/pkg/validation/validator.go#L117 and https://github.com/istio/istio/blob/1.4.5/tools/istio-iptables/pkg/validation/validator.go#L166-L178).

Expected behavior istio-validator validates pod CNI and successfully exits.

Steps to reproduce the bug Install Istio 1.4.5 with CNI and repair enabled. Attempt to launch a pod.

Version (include the output of istioctl version --remote and kubectl version and helm version if you used Helm) istio: 1.4.5 kube: 1.15.7

How was Istio installed? helm template

Environment where bug was observed (cloud vendor, OS, etc) AWS - kops

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 38 (23 by maintainers)

Most upvoted comments

@tmshort Thank you for the comment. The repair and validation is heuristic: they don’t understand why istio-cni does not inject the iptables. The next step is to confirm the iptables

  1. if iptables is set up correctly, we should blame istio-validation.
  2. if iptables is not set correctly, checkout the logs of ds/istio-cni-node install-cni pod on the node where the failing istio-proxy is running. It’s likely istio is not installed with the correct setting with k8s. cniBinDir, cniConfDir…

Any how, you can access the k8s node, run iptables-save in the network namespace of the failing pod

on the k8s node, run below as root

1. find the container id: run `docker ps`
2. figure out the network namespace: `docker inspect --format '{{ .State.Pid }}' {container-id-or-name}`
3. run `nsenter -t {pid} -n sudo iptables-save`

Let’s start with if the iptables-save container any rule with the name “ISTIO”

@zs-ddl interesting. Then it loops back to #14977

@towens Is there some helpful hints in the describe for the DaemonSet?

kubectl describe -n istio-system ds

For us, we’re seeing the following:

$ kubectl describe -n istio-system ds
...
Events:
  Type     Reason        Age                  From                  Message
  ----     ------        ----                 ----                  -------
  Warning  FailedCreate  26s (x15 over 108s)  daemonset-controller  Error creating: pods "istio-cni-node-" is forbidden: pods with system-cluster-critical priorityClass is not permitted in istio-system namespace
$
$ kubectl version --short
Client Version: v1.17.0
Server Version: v1.15.7-gke.23

Per the K8s discussion on system-cluster-critical, looks like we’ll have to wait until GKE supports K8s 1.17 to allow for the priority class to be in another namespace other than kube-system.