istio: Istio-proxy fails to start with Istio 1.1

I have an AKS test-cluster, with a few sample apps deployed. With Istio 1.0.2 both the app container and the istio-proxy container start as expected. When removing Istio and the sample app, installing Istio daily build istio-release-1.1-20181021-09-15 from scratch, and then redeploying the sample apps, the sidecar proxy fails to start, while logging the following:

<*snip* start>

2018-10-24T12:29:06.122324Z	info	Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster nginx-1-pod.default --service-node sidecar~10.244.0.17~nginx-1-pod-6d86955d8d-xhj79.default~default.svc.cluster.local --max-obj-name-len 189 --allow-unknown-fields -l warning --v2-config-only]
[2018-10-24 12:29:06.144][19][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:243] gRPC config stream closed: 14, no healthy upstream
[2018-10-24 12:29:06.144][19][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:43] Unable to establish new stream
[2018-10-24 12:29:07.431][19][warning][config] src/envoy/utils/mixer_control.cc:171] ExtractInfo  metadata missing:
[2018-10-24 12:29:07.432][19][warning][config] src/envoy/utils/mixer_control.cc:171] ExtractInfo  metadata missing:

<*snip* multiple identical log lines>

[2018-10-24 12:29:07.498][48][warning][config] src/envoy/utils/mixer_control.cc:171] ExtractInfo  metadata missing:
[2018-10-24 12:29:07.498][48][warning][config] src/envoy/utils/mixer_control.cc:171] ExtractInfo  metadata missing:
2018-10-24T12:29:07.997196Z	info	Envoy proxy is NOT ready: 3 errors occurred:

* failed checking application ports. listeners="0.0.0.0:15090","10.244.0.17:8081","10.0.253.20:443","10.0.253.20:31400","10.0.184.140:15011","10.0.69.228:42422","10.0.178.7:443","10.0.0.1:443","10.0.132.238:80","10.0.165.166:443","10.0.225.39:443","0.0.0.0:15004","0.0.0.0:80","0.0.0.0:15030","0.0.0.0:8080","0.0.0.0:9093","0.0.0.0:8060","0.0.0.0:8081","0.0.0.0:9091","0.0.0.0:15029","0.0.0.0:15031","0.0.0.0:15010","0.0.0.0:9901","0.0.0.0:15032","0.0.0.0:9090","10.244.0.17:15020","0.0.0.0:15001"
* envoy missing listener for inbound application port: 0
* envoy missing listener for inbound application port: 80
2018-10-24T12:29:09.997207Z	info	Envoy proxy is NOT ready: 3 errors occurred:

* failed checking application ports. listeners="0.0.0.0:15090","10.244.0.17:8081","10.0.253.20:443","10.0.253.20:31400","10.0.184.140:15011","10.0.69.228:42422","10.0.178.7:443","10.0.0.1:443","10.0.132.238:80","10.0.165.166:443","10.0.225.39:443","0.0.0.0:15004","0.0.0.0:80","0.0.0.0:15030","0.0.0.0:8080","0.0.0.0:9093","0.0.0.0:8060","0.0.0.0:8081","0.0.0.0:9091","0.0.0.0:15029","0.0.0.0:15031","0.0.0.0:15010","0.0.0.0:9901","0.0.0.0:15032","0.0.0.0:9090","10.244.0.17:15020","0.0.0.0:15001"
* envoy missing listener for inbound application port: 0
* envoy missing listener for inbound application port: 80

<etc., repeating every two seconds>

In the listener list, 10.244.0.17 is the pod IP. According to https://istio.io/help/ops/traffic-management/proxy-cmd/#deep-dive-into-envoy-configuration, I should see A virtual listener on the pod IP for each exposed port for inbound traffic., but I can only find entries for 10.244.0.17:15020 and 10.244.0.17:8081, the latter being the service port… What could have gone wrong here?

Sample app configuration: sample_app.yaml.txt

Version Kubernetes: 1.11.2 Istio: Version:“release-1.1-20181021-09-15”, GitRevision:“bd24a62648c07e24ca655c39727aeb0e4761919a”

Installation Using Helm 2.9.1:

helm install ${ISTIO_HOME}/install/kubernetes/helm/istio --name istio --namespace istio-system --tls --wait \
    --set global.configValidation=true \
    --set sidecarInjectorWebhook.enabled=true \
    --set gateways.istio-ingressgateway.loadBalancerIP=${PUBLIC_IP}

Environment MS Azure, AKS

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 22
Comments: 139 (52 by maintainers)

Commits related to this issue

annotation to work around Istio 1.1 bug more info at https://github.com/istio/istio/issues/9504 — committed to carlosaml/flux-get-started by carlosaml 5 years ago
GPII-4085: Add headless svc for CouchDB Prometheus exporter One of Istio requirements is that every Pod must be associated with service - see https://istio.io/docs/setup/kubernetes/additional-setup/r... — committed to stepanstipl/gpii-infra by stepanstipl 5 years ago

Most upvoted comments

So the issue here is coming down to the list of application ports that the readiness probe is using.

Initially, the readiness probe just waited for the 1st update from Pilot before marking the container as “ready”. That wasn’t quite enough, however, since k8s would occasionally send partial lists of endpoints to Pilot/Envoy, which means that some inbound ports were not configured by the time the container would go “ready”. In this case, there would be a small period of 503s until the rest of the endpoints were configured.

To address this, we added the --applicationPorts flag, which when set will require that all ports in the list be received by Envoy before marking the container as “ready”. By default, this is set to the list of ports exposed by the container.

This seems to work when service ports match deployment ports. When there’s a mismatch, however, this seems to be causing issues.

You can work around this by setting the readiness.status.sidecar.istio.io/applicationPorts annotation in your deployment. It’s just a comma-separated list of port numbers. If empty (""), this part of the readiness check will be skipped entirely and the readiness probe will go back to just looking for the first update from pilot.

nmittler on Dec 13, 2018

You should be able to disable readiness by setting the annotation status.sidecar.istio.io/port: "0" in your deployment

nmittler on Nov 16, 2018

I list three scenarios (reproducible) below for readiness probe failure of sidecar.

No service deployed for a deployment
Service deployed, however, the service doesn’t expose port for any containerport defined in Pod
The label defined in Pod template and service doesn’t match, https://github.com/istio/istio/issues/11979

Please add if there are other scenarios. Thanks.

FYI. Troubleshooting tips

Use the following command to check logs of sidecar kubect logs Pod_Name -n Namespace -cistio-proxy
Use the following command to list all the listeners. istioctl proxy-config listeners -n Namespace Pod_Name
Check whether the containerport listened by local addresses of this Pod

mailzyok on May 5, 2019

still seeing the issue in 1.1.3

walkerline on Apr 18, 2019

@fhoy To clarify, your attached sample app seems to be working, but I am seeing the error still when I create a deployment without a service. Maybe I wrong, but i didn’t think a service needs to be defined in order for istio-proxy to be healthy?

joberdick on Nov 5, 2018

I just ran into the same issue. Using the readiness.status.sidecar.istio.io/applicationPorts annotation didn’t work for me.

The resolution i ended up with was to define a readinessProbe on the container.

readinessProbe:
  httpGet:
    path: /
    port: 80
    scheme: HTTP

joelhays on Jul 17, 2019

Yes, putting the annotation in the right place worked so this is not a blocking issue, but still occurring.

cleverguy25 on May 21, 2019

@cleverguy25 it should be done in your deployment.yaml yes, but in the pod template’s annotations:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: dep_name
  namespace: dep_namespace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my_app
  template:
    name: dep_name
    metadata:
      annotations:
        # HERE HERE HERE!
        readiness.status.sidecar.istio.io/applicationPorts: ""
      labels:
        app: my_app
    spec:
      containers:
        ...

markmssd on May 16, 2019

I’ve just figured out this issue occurs when there is no service pointing to one of pod’s containerPorts.

istio : 1.1.5 k8s : v1.12.7-gke.10

ismailbaskin on May 4, 2019

@nmittler @sdake Submitted https://github.com/istio/istio/pull/12053/files to round it off

knrc on Feb 26, 2019

my env, the kubelet will start containers one by one and then update status, so it is very slow. It is really weird for you to see this deploying 5 pods.

To clarify, kubelet schedule things simultaneously in my setup with or without the Envoy proxy.

This issue should be pods not getting into a ready state because Envoy doesn’t correct itself and persist its weird state, and I’m not sure how kubelet could be related to that.

To be honest, I don’t know the internals of kubelet so I just don’t know how it relates, but everything works absolutely fine when running without the Envoy proxy as the sidecar, so there’s likely something with Envoy and Pilot going on here.

darwin67 on Jun 25, 2019

Right, I think that was mentioned earlier in the issue.

From: ismail BASKIN notifications@github.com Sent: Saturday, May 4, 2019 1:32:34 AM To: istio/istio Cc: Cleve Littlefield; Mention Subject: Re: [istio/istio] Istio-proxy fails to start with Istio 1.1 (#9504)

I’ve just figured out this issue occurs when there is no service pointing to one of pod’s containerPorts.

istio : 1.1.5 k8s : v1.12.7-gke.10

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/istio/istio/issues/9504#issuecomment-489306902, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABJAMIK4YBMOQXLIHU62WGTPTVC2DANCNFSM4F7ASHDQ.

cleverguy25 on May 4, 2019

This workaround did not work for me at all.

From: mailzyok notifications@github.com Sent: Monday, April 29, 2019 3:14:44 AM To: istio/istio Cc: Cleve Littlefield; Mention Subject: Re: [istio/istio] Istio-proxy fails to start with Istio 1.1 (#9504)

Met same issue using Istio 1.1.3, the problem happened more than 50% possibility, and we cannot figure out any clue on how it happens. We use the workaround to put annotation readiness.status.sidecar.istio.io/applicationPorts: “” on application deployment spec to skip it for now, but this will cause 503 service unavailable if you access the application port right after deployment.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/istio/istio/issues/9504#issuecomment-487526569, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABJAMIONWJTA6ZJVYH23TX3PS3DBJANCNFSM4F7ASHDQ.

cleverguy25 on Apr 29, 2019

Right… in that case, you’re disabling the entire readiness probe altogether. There are a couple of annotations that are at play here:

status.sidecar.istio.io/port: defines the port for the readiness/liveness probes for the sidecar. If this is set to 0, the readiness probe is disabled and the sidecar will always be “ready”.

Assuming that is non-zero, readiness is enabled for the sidecar. The readiness check is composed of 2 parts:

Check whether or not configuration (LDS, CDS) was received from Pilot.
Determine whether or not configuration for all of the application’s inbound ports have been received.

For 2, the default is to check all inbound ports on the application’s container, since the webhook injector doesn’t have service information available when generating the sidecar config. However, there are cases where not all container ports will receive inbound configuration from Pilot. In this situation, you can override this default behavior with another annotation:

readiness.status.sidecar.istio.io/applicationPorts: If set to “”, will disable part 2 of the readiness check. Otherwise, it’s a comma-separated list of ports for which the readiness probe will await inbound configuration.

I suspect that this may be the annotation that you want to disable or modify.

nmittler on Mar 28, 2019

@sdake @nmittler @duderino FYI yesterday was my fault, I had switched back to release-1.1 but it did not yet have #12053 merged so was still seeing the race. I rebuilt the containers yesterday afternoon and have not seen any failures since so I believe our race is addressed by #11999 and #12053.

I’m continuing soak testing today and will raise another issue if it recurs, in the meantime assume the issue we have been seeing is addressed.

knrc on Feb 27, 2019

@nmittler @sdake I tracked our issue down to a race between the pod connecting to pilot and SetServiceInstances being called. If SetServiceInstances was called before the endpoint was associated with the service then it would set node.ServiceInstances to [].

I came up with a solution but it looks as if I’ve just been beaten to it, https://github.com/istio/istio/pull/11999 implemented a similar fix. There’s one minor logic bug not caught in that pull request so I’ll submit that.

knrc on Feb 26, 2019

I’ve been using the readiness.status.sidecar.istio.io/applicationPorts: "" to workaround this issue until today, when I found that I was getting random 502’s from the pods in a deployment. Seems some of the envoy proxies truly weren’t ready, but without the check they were still in the pool.

Removing the annotation and killing the pods manually until I got a full set that were truly healthy resolved the issue.

I highly recommend that anyone considering using the readiness.status.sidecar.istio.io/applicationPorts: "" does so carefully as it’s certainly not a real fix.

One thing I’ve observed (and I believe others have to) is that killing the istio-pilot pod seems to kick things into a working state - not immediately, but it starts pilot handing out the correct configs within a few seconds.

Without kicking pilot I’ve seen pods that still aren’t healthy after 20 minutes.

This is all on a relatively low usage cluster (maybe 200 pods, only about 6 nodes), and Pilot’s cpu/memory usage is very low, even when it’s (apparently) screwing up.

Can someone please address this properly? It’s been present from 1.1.0 and I’ve tested every version up to and including 1.1.9 so far with no change.

gdhagger on Jun 21, 2019

I am doing it at the deployment. Where should it be done, exactly? This is very unclear and there is not much documentation. Earlier in this thread it mentions it as an annotation on the deployment.

If I set readiness.status.sidecar.istio.io/applicationPorts to the right set of ports. I see this as a command line to the istio-proxy container:

–applicationPorts 8250,19000,8260,80,9700,7483,27142,18000,18003

So if setting it on the deployment works for specific ports, why wouldn’t setting it to “” work there too?

cleverguy25 on May 16, 2019

Maybe he’s adding readiness.status.sidecar.istio.io/applicationPorts: "" to the deployment annotations, instead of the pod template annotations, so it doesn’t take effect?

markmssd on May 15, 2019

@cleverguy25 I don’t see how that can happen if you set applicationPorts to empty. From the code here we skip the check if that is the case: https://github.com/istio/istio/blob/release-1.1/pilot/cmd/pilot-agent/status/ready/probe.go#L46. Can you describe/get the logs of the proxy and look for the --applicationPort flag? Perhaps the annotation is not being applied correctly, otherwise I don’t know how it could still be doing the check

howardjohn on May 15, 2019

Ingress gateway and other envoy proxies don’t start

2019-04-19T00:19:41.793893Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-04-19T00:19:43.793777Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-04-19T00:19:45.794051Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-04-19T00:19:47.793979Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-04-19T00:19:49.793837Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-04-19T00:19:51.793920Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-04-19T00:19:53.793980Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-04-19T00:19:55.793823Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-04-19T00:19:57.793914Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-04-19T00:19:59.793790Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2019-04-19T00:20:01.793854Z	info	Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected

Containers do start up, if the readiness probes are disabled/removed. But, the envoy instance in ingressgateway and sidecars are not able to connect to istio-pilot.

[2019-04-19 04:18:40.788][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:49] Unable to establish new stream
[2019-04-19 04:19:04.967][19][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, no healthy upstream

Telemetry and Policy pods however are running without issue. They do have proxy attached. Created new setup with 1.1.3 and encountered this. Tried 1.1.2 and the errors are still there.

adityats2020 on Apr 19, 2019

@hzxuzhonghu awesome that fix works. Using release-1.1-latest-daily removed errors like:

* envoy missing listener for inbound application port: 8080

Just to wait for 1.1.2 to be cut now 😃

jdamata on Apr 4, 2019

@markserrano915

You will see many of your containers in the pending state. If you run kubectl describe on them, I would speculate you will see errors relating to insufficient meemory or cpu resources. This can be fixed by increasing the number of worker nodes in the system. Can you report the describe results on some of these errors?

When these pods do not enter the running state, a nonsensical error about port 0 and port xxxx is printed in the application sidecars. In addition, it appears there is a readiness check problem. I don’t totally understand why the readiness check fails, but without the microservices running, the system will show these two results.

In 1.1 series, the defaults for CPU and memory were increased. I am not sure what these new minimum requirements are but I think @mandarjog does.

Cheers -steve

sdake on Mar 1, 2019

Looks like this is fixed until proven guilty. @knrc please reopen if your investigation finds anything

duderino on Feb 28, 2019