istio: K8S Dashboard loads slowly, tiller unresponsive once istio 1.0.0 installed

Describe the bug On a new Azure Container Service (AKS) cluster with 1.10.6 – as soon as I install istio 1.0.0 (I’ve tried the official release and daily istio-release-1.0-20180803-09-15) – requests in the K8S dashboard take 5-10 seconds or timeout completely. Additionally commands to tiller timeout retrieving configmaps.

All kubectl commands I can think to run succeed and run quickly. Installing istio 0.8 does not have this issue.

Expected behavior No negative impact to other services when installing istio.

Steps to reproduce the bug

Create new AKS cluster.
Install istio… I used the following helm command (and corresponding kubectl apply): helm template install/kubernetes/helm/istio --name istio --set servicegraph.enabled=true --set grafana.enabled=true --set tracing.enabled=true --set galley.enabled=false --set telemetry-gateway.grafanaEnabled=true --set telemetry-gateway.prometheusEnabled=true --namespace istio-system
Wait a few minutes for the various pods to start up.
Run kubectl proxy (or az aks browse) and try to navigate in the dashboard. Or run helm ls.

Version Istio: release-1.0-20180803-09-15 K8S: 1.10.6

Is Istio Auth enabled or not? No

Environment Azure AKS

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 2
Comments: 57 (23 by maintainers)

Most upvoted comments

Thanks @BernhardRode - that helps possibly eliminate OOM problems. I’m on PTO until the 10th, just thought I’d offer some quick help here, but I don’t have time at this immediate moment to spin up AKS. Once PTO finishes up, will have time.

Sounds like a common problem people are suffering with.

sdake on Aug 7, 2018

lachie83 on Aug 27, 2018

@rsnj It appears, at the moment, that on AKS, you have to chose between policy or telemetry. If you aren’t enforcing any policies in the Mixer layer (rate limits, whitelists, etc.), then I would recommend prioritizing telemetry (but that’s the part of the system I spend the most time on, so I may be slightly biased). Istio RBAC currently does not require Mixer, so you’ll still have some functionality policy-wise.

To be successful without istio-policy running, you’ll need to turn off check calls (otherwise you’ll get connectivity issues as requests are denied because the proxy cannot reach the policy service). To do that, you need to install Istio with global.disablePolicyChecks set to true. I haven’t spent much time trying this out, but I know that others have done this, so if this is of interest, I’m sure we can get this working. Istio is working on documentation for piecemeal installs. This would be a good test case.

In the slightly longer term, Mixer should reduce the number of CRDs down to 3, which should should help reduce burden on the API Server. Sometime after that, Mixer will receive config directly from Galley, reducing the burden even further.

Does that help?

douglas-reid on Aug 24, 2018

@fhoy @douglas-reid I just updated my existing PR for the helm chart to include the switch for useAdapterCRDs https://github.com/istio/istio/pull/9435/files

dtzar on Oct 24, 2018

@douglas-reid @rsnj I appreciate the heads up. I’m investigating and will report back.

lachie83 on Aug 25, 2018

@rsnj this seems like an issue with the resources given to the API Server. I’d suggest trying the experiment in reverse (delete both, then add back istio-telemetry and then, after testing, add back istio-policy).

Mixer (which backs both istio-policy and istio-telemetry) opens a fair number of watches (~40) on CRDs and otherwise. I suspect that the API Server in these clusters is just not setup to handle this.

If there Azure Support has any information on how to increase resources for the API Server, that’d be the best way to resolve the issue. Maybe @lachie83 has some ideas (or contacts that do) here ?

douglas-reid on Aug 23, 2018

@douglas-reid I’m using a brand new AKS Cluster running Kubernetes 1.11.2 that has no load on it. I installed Istio via helm using the default settings. I then deployed a simple service and connected it to a gateway. After the services deployed the entire system went to into deadlock and the istio-policy and istio-telemetry started using more and more CPU until they replicated and the second replica just went into CrashLoopBackOff. My service was never accessible.

Looking at the logs, I can see my services deployed and then its just a steady stream of the same error coming from istio-mixer and istio-pilot.

There are thousands of errors just like these:

2018-08-23T14:59:29.341Z | istio-release/mixer | 2018-08-23T14:59:29.341670Z\terror\tistio.io/istio/mixer/adapter/kubernetesenv/cache.go:146: Failed to list *v1beta2.ReplicaSet: the server was unable to return a response in the time allotted, but may still be processing the request (get replicasets.apps)\n
2018-08-23T14:59:29.175Z | istio-release/mixer | 2018-08-23T14:59:29.175730Z\terror\tistio.io/istio/mixer/pkg/config/crd/store.go:119: Failed to list *unstructured.Unstructured: the server was unable to return a response in the time allotted, but may still be processing the request (get redisquotas.config.istio.io)\n
2018-08-23T14:59:29.163Z | istio-release/mixer | 2018-08-23T14:59:29.163277Z\terror\tistio.io/istio/mixer/pkg/config/crd/store.go:119: Failed to list *unstructured.Unstructured: the server was unable to return a response in the time allotted, but may still be processing the request (get tracespans.config.istio.io)\n
2018-08-23T14:59:29.162Z | istio-release/pilot | 2018-08-23T14:59:29.162425Z\terror\tistio.io/istio/pilot/pkg/serviceregistry/kube/controller.go:217: Failed to list *v1.Endpoints: the server was unable to return a response in the time allotted, but may still be processing the request (get endpoints)\n
2018-08-23T14:59:29.158Z | istio-release/mixer | 2018-08-23T14:59:29.158004Z\terror\tistio.io/istio/mixer/adapter/kubernetesenv/cache.go:146: Failed to list *v1beta2.ReplicaSet: the server was unable to return a response in the time allotted, but may still be processing the request (get replicasets.apps)\n
2018-08-23T14:59:29.157Z | istio-release/mixer | 2018-08-23T14:59:29.157718Z\terror\tistio.io/istio/mixer/adapter/kubernetesenv/cache.go:145: Failed to list *v1.Pod: the server was unable to return a response in the time allotted, but may still be processing the request (get pods)\n
2018-08-23T14:59:29.157Z | istio-release/mixer | 2018-08-23T14:59:29.157722Z\terror\tistio.io/istio/mixer/pkg/config/crd/store.go:119: Failed to list *unstructured.Unstructured: the server was unable to return a response in the time allotted, but may still be processing the request (get stdios.config.istio.io)\n
2018-08-23T14:59:29.157Z | istio-release/mixer | 2018-08-23T14:59:29.157766Z\terror\tistio.io/istio/mixer/adapter/kubernetesenv/cache.go:145: Failed to list *v1.Pod: the server was unable to return a response in the time allotted, but may still be processing the request (get pods)\n
2018-08-23T14:59:29.156Z | istio-release/mixer | 2018-08-23T14:59:29.156579Z\terror\tistio.io/istio/mixer/adapter/kubernetesenv/cache.go:146: Failed to list *v1beta2.ReplicaSet: the server was unable to return a response in the time allotted, but may still be processing the request (get replicasets.apps)\n
2018-08-23T14:59:29.156Z | istio-release/mixer | 2018-08-23T14:59:29.155984Z\terror\tistio.io/istio/mixer/adapter/kubernetesenv/cache.go:148: Failed to list *v1beta1.ReplicaSet: the server was unable to return a response in the time allotted, but may still be processing the request (get replicasets.extensions)\n
2018-08-23T14:59:29.078Z | istio-release/mixer | 2018-08-23T14:59:29.077170Z\terror\tistio.io/istio/mixer/adapter/kubernetesenv/cache.go:146: Failed to list *v1beta2.ReplicaSet: the server was unable to return a response in the time allotted, but may still be processing the request (get replicasets.apps)\n
2018-08-23T14:59:29.078Z | istio-release/pilot | 2018-08-23T14:59:29.078054Z\terror\tistio.io/istio/pilot/pkg/config/kube/crd/controller.go:208: Failed to list *crd.EnvoyFilter: the server was unable to return a response in the time allotted, but may still be processing the request (get envoyfilters.networking.istio.io)\n
2018-08-23T14:59:29.077Z | istio-release/mixer | 2018-08-23T14:59:29.076728Z\terror\tistio.io/istio/mixer/adapter/kubernetesenv/cache.go:146: Failed to list *v1beta2.ReplicaSet: the server was unable to return a response in the time allotted, but may still be processing the request (get replicasets.apps)\n
2018-08-23T14:59:29.071Z | istio-release/mixer | 2018-08-23T14:59:29.071149Z\terror\tistio.io/istio/mixer/pkg/config/crd/store.go:119: Failed to list *unstructured.Unstructured: the server was unable to return a response in the time allotted, but may still be processing the request (get listcheckers.config.istio.io)\n
2018-08-23T14:59:28.914Z | prometheus | level=error ts=2018-08-23T14:59:28.894316964Z caller=main.go:218 component=k8s_client_runtime err=\"github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:287: Failed to list *v1.Endpoints: the server cannot complete the requested operation at this time, try again later (get endpoints)\"\n
2018-08-23T14:59:28.914Z | istio-release/mixer | 2018-08-23T14:59:28.893873Z\terror\tistio.io/istio/mixer/adapter/kubernetesenv/cache.go:147: Failed to list *v1.ReplicaSet: the server was unable to return a response in the time allotted, but may still be processing the request (get replicasets.apps)\n

rsnj on Aug 23, 2018

To report back… I deployed a new cluster with useAdapterCRDs=false and its been running for ~10 days or so now without a recurrence of the watch issue slowing helm/dashboard.

Good work! It’d be great if we could get the helm option @douglas-reid mentioned so my scripts can stop hacking the subchart in 1.1 releases. Can’t find the PR mentioned though.

blackbaud-brandonstirnaman on Nov 19, 2018

Just an update I left the cluster running for about 8 hours without istio-telemetry running. The istio-policy pod autoscaled to 5 instances that were all in a CrashLoopBackOff state and my entire cluster went down again. The cluster has zero load on it and only has a simple web service running without any external dependencies.

@douglas-reid I will try your suggestion next to enable telemetry and disable policy.

rsnj on Aug 24, 2018

@rsnj Wow: Failed to list *v1.Pod: the server was unable to return a response in the time allotted, but may still be processing the request (get pods). That is not good. I wonder why everything with the API Server is so slow.

douglas-reid on Aug 23, 2018

Ran the same daily (release-1.0-20180822-09-15) overnight on AKS (Istio installed via Helm with no options) and I also put in a couple of test services. There is no load on the cluster, no one is using it. As @rsnj reported, telemetry and policy are having a bad time:

❯ kubectl get pods --all-namespaces
NAMESPACE      NAME                                        READY     STATUS             RESTARTS   AGE
istio-system   istio-citadel-5d85b758f4-vkr5z              1/1       Running            0          16h
istio-system   istio-egressgateway-5764c598cf-qckqm        1/1       Running            0          16h
istio-system   istio-galley-5f595485b9-g9fb4               1/1       Running            0          16h
istio-system   istio-ingressgateway-6647dd4b64-4pmc2       1/1       Running            0          16h
istio-system   istio-pilot-57ffcdc795-74pq7                2/2       Running            0          16h
istio-system   istio-policy-87bfd665b-mvjmz                1/2       CrashLoopBackOff   219        9h
istio-system   istio-policy-87bfd665b-psqjh                1/2       CrashLoopBackOff   317        14h
istio-system   istio-policy-87bfd665b-qjjvg                1/2       CrashLoopBackOff   283        12h
istio-system   istio-policy-87bfd665b-shmx4                2/2       Running            0          16h
istio-system   istio-policy-87bfd665b-zhxxm                1/2       CrashLoopBackOff   341        15h
istio-system   istio-sidecar-injector-6677558cfc-jlbp6     1/1       Running            0          16h
istio-system   istio-statsd-prom-bridge-7f44bb5ddb-fgxkv   1/1       Running            0          16h
istio-system   istio-telemetry-696487b84f-2n92r            2/2       Running            0          16h
istio-system   istio-telemetry-696487b84f-4b9px            1/2       CrashLoopBackOff   329        15h
istio-system   istio-telemetry-696487b84f-7mdg8            1/2       CrashLoopBackOff   339        15h
istio-system   istio-telemetry-696487b84f-pxgkq            1/2       CrashLoopBackOff   303        13h
istio-system   istio-telemetry-696487b84f-wpr6h            1/2       CrashLoopBackOff   281        12h
istio-system   prometheus-84bd4b9796-lfhd4                 1/1       Running            0          16h
kube-system    azureproxy-6496d6f4c6-4hx8w                 1/1       Running            2          17h
kube-system    heapster-864b6d7fb7-gjgj6                   2/2       Running            0          17h
kube-system    kube-dns-v20-5695d5c69d-879xd               3/3       Running            0          17h
kube-system    kube-dns-v20-5695d5c69d-l897h               3/3       Running            0          17h
kube-system    kube-proxy-7pfbs                            1/1       Running            0          17h
kube-system    kube-proxy-whzvk                            1/1       Running            0          17h
kube-system    kube-proxy-zpkjt                            1/1       Running            0          17h
kube-system    kube-svc-redirect-6bksg                     1/1       Running            0          17h
kube-system    kube-svc-redirect-j8jj5                     1/1       Running            1          17h
kube-system    kube-svc-redirect-k7w48                     1/1       Running            0          17h
kube-system    kubernetes-dashboard-66bf8db6cf-pcqhv       1/1       Running            3          17h
kube-system    metrics-server-64f6d6b47-9922n              1/1       Running            0          17h
kube-system    tiller-deploy-895d57dd9-2k6ns               1/1       Running            0          16h
kube-system    tunnelfront-dcc6d8447-gpgsq                 1/1       Running            0          17h

I was using istio-release-1.0-20180820-09-15 and Galley was crashing, so the problem seemed to move around (See #7586).

polothy on Aug 23, 2018

I also installed the latest daily istio-release-1.0-20180822-09-15 build on my AKS cluster. Everything was running smoothly for a bit so I deployed a simple application with a gateway configuration and then I noticed the istio-telemetry and istio-policy pods using a lot of CPU. When they autoscaled to 2 replicas their replicas went in to a CrashLoopBackOff state with the error: Liveness probe failed: Get http://10.200.0.90:9093/version: dial tcp 10.200.0.90:9093: connect: connection refused

Looking at my logs there are a lot of these errors: Failed to list *v1beta2.ReplicaSet: the server was unable to return a response in the time allotted, but may still be processing the request (get replicasets.apps) and these gc 233 @3240.155s 0%: 0.044+5.9+7.6 ms clock, 0.089+0.24/2.8/8.2+15 ms cpu, 15->15->7 MB, 16 MB goal, 2 P\n

My Kubernetes Dashboard is still unresponsive, but Helm is working. Microsoft has been responsive to me through the Azure support channel, but they are out of ways to troubleshoot the issue.

rsnj on Aug 23, 2018

@CapTaek radio silence from AKS-Help so far.

Installed the latest daily on a new cluster (with galley enabled this time). Nothing crashing/restarting but still the same problems with K8S Dashboard performance with istio installed/no issues with it removed.

blackbaud-brandonstirnaman on Aug 23, 2018

Just got this from Azure Support:

Our engineering team was able to see that the istio-pilot pod eventually fails its health check and goes into CrashLoopBackoff state and restarts. The health check fails with: Readiness probe failed: Get http://10.200.0.39:8080/debug/endpointz: dial tcp 10.200.0.39:8080: connect: connection refused.

It appears either this health check is misconfigured, or the istio-proxy container in the istio-pilot pod is failing to open a listen socket as expected.

Our engineering team ran commands from within both containers in that pod and from the istio-ingress controller, and “connection refused” everywhere tells them that the connectivity is good but that nothing is listening at the expected address http://10.200.0.39:8080 even though that IP was successfully assigned to the pod.

In reading through Istio documentation, there is not much troubleshooting information for our engineering team to assist with. It is suggested that you compare this health check configuration to their working Istio installation, and perhaps engage with the support team at Istio.

rsnj on Aug 21, 2018

I just tried to reconnect to the cluster and the issue is still there 😦

istio-galley is crashing all the time.

➜  ~ kubectl get pods --all-namespaces=true -o wide

NAMESPACE      NAME                                        READY     STATUS             RESTARTS   AGE       IP            NODE
istio-system   grafana-86645d6b4d-66kt4                    1/1       Running            0          8h        10.244.1.8    aks-nodepool1-25917760-1
istio-system   istio-citadel-55d9bb9b5f-w2l66              1/1       Running            0          8h        10.244.0.6    aks-nodepool1-25917760-0
istio-system   istio-cleanup-secrets-7sff5                 0/1       Completed          0          8h        10.244.0.4    aks-nodepool1-25917760-0
istio-system   istio-egressgateway-74bbdd9669-dwt5b        1/1       Running            0          8h        10.244.1.6    aks-nodepool1-25917760-1
istio-system   istio-galley-d4bc6c974-ppcr6                0/1       CrashLoopBackOff   127        8h        10.244.1.10   aks-nodepool1-25917760-1
istio-system   istio-grafana-post-install-6pt4v            0/1       Completed          0          8h        10.244.2.5    aks-nodepool1-25917760-2
istio-system   istio-ingressgateway-756584cc64-kvkq5       1/1       Running            0          8h        10.244.1.7    aks-nodepool1-25917760-1
istio-system   istio-pilot-7dd78846f5-hg67f                2/2       Running            0          8h        10.244.2.7    aks-nodepool1-25917760-2
istio-system   istio-policy-b9d65465-5jzfw                 2/2       Running            0          2h        10.244.2.8    aks-nodepool1-25917760-2
istio-system   istio-policy-b9d65465-7k767                 2/2       Running            0          1h        10.244.0.12   aks-nodepool1-25917760-0
istio-system   istio-policy-b9d65465-pz7bb                 2/2       Running            0          8h        10.244.0.5    aks-nodepool1-25917760-0
istio-system   istio-policy-b9d65465-qn2kl                 2/2       Running            0          5h        10.244.1.11   aks-nodepool1-25917760-1
istio-system   istio-policy-b9d65465-tzll5                 2/2       Running            0          2h        10.244.1.13   aks-nodepool1-25917760-1
istio-system   istio-sidecar-injector-854f6498d9-jdxf5     1/1       Running            0          8h        10.244.0.9    aks-nodepool1-25917760-0
istio-system   istio-statsd-prom-bridge-549d687fd9-p5xzf   1/1       Running            0          8h        10.244.1.5    aks-nodepool1-25917760-1
istio-system   istio-telemetry-64fff55fdd-94qjg            2/2       Running            0          3h        10.244.1.12   aks-nodepool1-25917760-1
istio-system   istio-telemetry-64fff55fdd-fjg8q            2/2       Running            0          8h        10.244.2.6    aks-nodepool1-25917760-2
istio-system   istio-telemetry-64fff55fdd-hscs2            2/2       Running            0          2h        10.244.0.10   aks-nodepool1-25917760-0
istio-system   istio-telemetry-64fff55fdd-krc4w            2/2       Running            0          2h        10.244.0.11   aks-nodepool1-25917760-0
istio-system   istio-telemetry-64fff55fdd-pnl7n            2/2       Running            0          2h        10.244.1.14   aks-nodepool1-25917760-1
istio-system   istio-tracing-7596597bd7-4fj7z              1/1       Running            0          8h        10.244.0.8    aks-nodepool1-25917760-0
istio-system   prometheus-6ffc56584f-r9ls6                 1/1       Running            0          8h        10.244.1.9    aks-nodepool1-25917760-1
istio-system   servicegraph-7bdb8bfc9d-gmhk6               1/1       Running            0          8h        10.244.0.7    aks-nodepool1-25917760-0
kube-system    azureproxy-58b96f4d87-78n7q                 1/1       Running            2          9h        10.244.2.3    aks-nodepool1-25917760-2
kube-system    heapster-6fdcf4f4f4-fp9wb                   2/2       Running            0          9h        10.244.0.2    aks-nodepool1-25917760-0
kube-system    kube-dns-v20-56b5b568d-hrmvz                3/3       Running            0          9h        10.244.0.3    aks-nodepool1-25917760-0
kube-system    kube-dns-v20-56b5b568d-prlng                3/3       Running            0          9h        10.244.1.2    aks-nodepool1-25917760-1
kube-system    kube-proxy-2t4f2                            1/1       Running            0          9h        10.240.0.4    aks-nodepool1-25917760-1
kube-system    kube-proxy-4hljf                            1/1       Running            0          9h        10.240.0.6    aks-nodepool1-25917760-0
kube-system    kube-proxy-ngks4                            1/1       Running            0          9h        10.240.0.5    aks-nodepool1-25917760-2
kube-system    kube-svc-redirect-vkw9r                     1/1       Running            0          9h        10.240.0.6    aks-nodepool1-25917760-0
kube-system    kube-svc-redirect-zjgx9                     1/1       Running            0          9h        10.240.0.4    aks-nodepool1-25917760-1
kube-system    kube-svc-redirect-zz5lh                     1/1       Running            0          9h        10.240.0.5    aks-nodepool1-25917760-2
kube-system    kubernetes-dashboard-7979b9b5f4-qg4kx       1/1       Running            2          9h        10.244.2.4    aks-nodepool1-25917760-2
kube-system    metrics-server-789c47657d-6q88f             1/1       Running            2          9h        10.244.2.2    aks-nodepool1-25917760-2
kube-system    tiller-deploy-759cb9df9-8gx7g               1/1       Running            0          8h        10.244.1.4    aks-nodepool1-25917760-1
kube-system    tunnelfront-6dc6bd7cb8-ntvjn                1/1       Running            0          9h        10.244.1.3    aks-nodepool1-25917760-1

Pods

➜  ~ k describe pods istio-galley-d4bc6c974-ppcr6  -n istio-system
Name:               istio-galley-d4bc6c974-ppcr6
Namespace:          istio-system
Priority:           0
PriorityClassName:  <none>
Node:               aks-nodepool1-25917760-1/
Start Time:         Fri, 17 Aug 2018 09:24:28 +0200
Labels:             istio=galley
                    pod-template-hash=806727530
Annotations:        scheduler.alpha.kubernetes.io/critical-pod=
                    sidecar.istio.io/inject=false
Status:             Running
IP:                 10.244.1.10
Controlled By:      ReplicaSet/istio-galley-d4bc6c974
Containers:
  validator:
    Container ID:  docker://e7cb57eb08156d2ed8ca24648dde9779d350534cd446e727ad1fb9555205d24a
    Image:         gcr.io/istio-release/galley:1.0.0
    Image ID:      docker-pullable://gcr.io/istio-release/galley@sha256:01394fea1e55de6d4c7fbfc28c2dd7462bd26e093008367972b04e29d5b475cf
    Ports:         443/TCP, 9093/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /usr/local/bin/galley
      validator
      --deployment-namespace=istio-system
      --caCertFile=/etc/istio/certs/root-cert.pem
      --tlsCertFile=/etc/istio/certs/cert-chain.pem
      --tlsKeyFile=/etc/istio/certs/key.pem
      --healthCheckInterval=2s
      --healthCheckFile=/health
      --webhook-config-file
      /etc/istio/config/validatingwebhookconfiguration.yaml
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 17 Aug 2018 17:47:32 +0200
      Finished:     Fri, 17 Aug 2018 17:47:47 +0200
    Ready:          False
    Restart Count:  131
    Requests:
      cpu:        10m
    Liveness:     exec [/usr/local/bin/galley probe --probe-path=/health --interval=4s] delay=4s timeout=1s period=4s #success=1 #failure=3
    Readiness:    exec [/usr/local/bin/galley probe --probe-path=/health --interval=4s] delay=4s timeout=1s period=4s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/istio/certs from certs (ro)
      /etc/istio/config from config (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from istio-galley-service-account-token-5slbs (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  istio.istio-galley-service-account
    Optional:    false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      istio-galley-configuration
    Optional:  false
  istio-galley-service-account-token-5slbs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  istio-galley-service-account-token-5slbs
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                 From                               Message
  ----     ------     ----                ----                               -------
  Warning  Unhealthy  1h (x167 over 8h)   kubelet, aks-nodepool1-25917760-1  Readiness probe failed: fail on inspecting path /health: stat /health: no such file or directory
  Normal   Started    28m (x122 over 8h)  kubelet, aks-nodepool1-25917760-1  Started container
  Warning  BackOff    3m (x1249 over 8h)  kubelet, aks-nodepool1-25917760-1  Back-off restarting failed container

BernhardRode on Aug 17, 2018

I ran your commands on bare metal. Note I don’t immediately have access to AKS. I suspect you are are in an OOM situation where the kernel continually kills processes and Kubernetes continually restarts them (hence the helm version/helm ls lag, and dashboard lag). This is hard to detect - but can be seen with kubectl describe on a restarted pod (grep for OOM).

Also, a namespace was not created for istio-system above. Are you executing an upgrade, or a fresh install? I suspect an upgrade will require more memory. Please reference the documentation for installation instructions here:

https://istio.io/docs/setup/kubernetes/helm-install/#option-1-install-with-helm-via-helm-template

and for Azure platform setup here:

https://istio.io/docs/setup/kubernetes/platform-setup/azure/

Note I have not personally validated the Azure platform setup instructions.

You can see from my AIO workflow below, that a very bare bones Ubuntu 16.04.04 bare metal system requires 13 GB of ram for Kubernetes + Istio. Reading the azure documentation on isito.io, you might try increasing the node count beyond 3 nodes, to provide yourself more memory for the cluster to work with. It also took around 6 minutes to deploy Kubernetes and Istio on my bare metal system (which is a beast of a server). You mentioned you waited a few minutes - this may not be sufficient for Istio to initialize.

sdake@falkor-07:~$ kubectl get pods -n istio-system
NAME                                        READY     STATUS      RESTARTS   AGE
grafana-5fb774bcc9-ds2xs                    1/1       Running     0          5m
istio-citadel-5b956fdf54-tmqnh              1/1       Running     0          5m
istio-cleanup-secrets-qcks7                 0/1       Completed   0          5m
istio-egressgateway-6cff45b4db-jwchv        1/1       Running     0          5m
istio-grafana-post-install-wkfrp            0/1       Completed   0          5m
istio-ingressgateway-fc648887c-fg56t        1/1       Running     0          5m
istio-pilot-6cd95f9cc4-kpxbh                1/2       Running     0          5m
istio-policy-75f75cc6fd-h5hzb               2/2       Running     0          5m
istio-sidecar-injector-6d59d46ff4-rmh2t     1/1       Running     0          5m
istio-statsd-prom-bridge-7f44bb5ddb-k96tx   1/1       Running     0          5m
istio-telemetry-544b8d7dcf-tpmp7            2/2       Running     0          5m
istio-tracing-ff94688bb-wzrs9               1/1       Running     0          5m
prometheus-84bd4b9796-cc4dl                 1/1       Running     0          5m
servicegraph-6c6dbbf599-9q2wb               1/1       Running     2          5m
sdake@falkor-07:~$ vmstat -s --unit M
       128829 M total memory
         3709 M used memory
        12993 M active memory
         7355 M inactive memory
       105425 M free memory
         1306 M buffer memory
        18387 M swap cache
            0 M total swap
            0 M used swap
            0 M free swap
    161538896 non-nice user cpu ticks
       207515 nice user cpu ticks
    194624795 system cpu ticks
  11580707701 idle cpu ticks
     44524010 IO-wait cpu ticks
            0 IRQ cpu ticks
     16863622 softirq cpu ticks
            0 stolen cpu ticks
      4849792 pages paged in
    602829514 pages paged out
            0 pages swapped in
            0 pages swapped out
   2308409615 interrupts
   2356661043 CPU context switches
   1529810860 boot time
    146777354 forks
sdake@falkor-07:~$ helm version
Client: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
sdake@falkor-07:~$ kubectl get services -n istio-system
NAME                       TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)                                                                                                     AGE
grafana                    ClusterIP      10.110.229.156   <none>         3000/TCP                                                                                                    6m
istio-citadel              ClusterIP      10.110.55.202    <none>         8060/TCP,9093/TCP                                                                                           6m
istio-egressgateway        ClusterIP      10.110.133.106   <none>         80/TCP,443/TCP                                                                                              6m
istio-ingressgateway       LoadBalancer   10.110.22.27     10.23.220.90   80:31380/TCP,443:31390/TCP,31400:31400/TCP,15011:31234/TCP,8060:30143/TCP,15030:31404/TCP,15031:31465/TCP   6m
istio-pilot                ClusterIP      10.110.60.79     <none>         15010/TCP,15011/TCP,8080/TCP,9093/TCP                                                                       6m
istio-policy               ClusterIP      10.110.22.189    <none>         9091/TCP,15004/TCP,9093/TCP                                                                                 6m
istio-sidecar-injector     ClusterIP      10.110.11.218    <none>         443/TCP                                                                                                     6m
istio-statsd-prom-bridge   ClusterIP      10.110.60.203    <none>         9102/TCP,9125/UDP                                                                                           6m
istio-telemetry            ClusterIP      10.110.183.250   <none>         9091/TCP,15004/TCP,9093/TCP,42422/TCP                                                                       6m
jaeger-agent               ClusterIP      None             <none>         5775/UDP,6831/UDP,6832/UDP                                                                                  6m
jaeger-collector           ClusterIP      10.110.165.191   <none>         14267/TCP,14268/TCP                                                                                         6m
jaeger-query               ClusterIP      10.110.56.112    <none>         16686/TCP                                                                                                   6m
prometheus                 ClusterIP      10.110.75.218    <none>         9090/TCP                                                                                                    6m
servicegraph               ClusterIP      10.110.175.150   <none>         8088/TCP                                                                                                    6m
tracing                    ClusterIP      10.110.235.175   <none>         80/TCP                                                                                                      6m
zipkin                     ClusterIP      10.110.158.41    <none>         9411/TCP                                                                                                    6m
sdake@falkor-07:~$

sdake on Aug 7, 2018