istio: Envoy shutting down before the thing it's wrapping can cause failed requests

Hey folks, We’re seeing some brief issues when deploying applications, we’ve narrowed it down to the fact that under a normal shutdown process, both envoy and the application it is wrapping get their SIGTERM at the same time.

Envoy will drain and shut down, but if the other application is still doing anything as part of its shutdown process, and attempts to make an outbound connection AFTER envoy has shut down, it’ll fail because the IPTables rules are sending traffic to a non-existent envoy. For example:

  • envoy and application1 get SIGTERM
  • envoy drains, there are no connections at the moment so this is almost instant
  • application1 tries to tell application2 that it’s going offline, because that’s what it does, and that fails as theres now no envoy

I’ve a rather hacky work around, and that is we add a lifecycle hook to istio-proxy, which blocks the shutdown until there are no other tcp listeners (thus, half heartedly ensuring our other service has terminated before envoy starts to terminate).

containers:
- name: istio-proxy
   lifecycle:
     preStop:
       exec:
         command: ["/bin/sh", "-c", "while [ $(netstat -plunt | grep tcp | grep -v envoy | wc -l | xargs) -ne 0 ]; do sleep 1; done"]

I guess this kind of falls into the lifecycle conversations we’ve been having recently (in https://github.com/istio/istio/issues/6642), and it highlights how we need to give consideration to both startup, and shutdown in order to have non-service impacting deployments when using the default kubernetes RollingUpdate when using Istio.

I’m wondering if you can think of a nicer that this could be accommodated. Basically envoy needs to stay up until the thing its wrapping has exited. A process check would be better still because the above hack only works when application1 is exposing a tcp listener - but istio-proxy cannot see the process list of the other containers (unless you can think of some way to do this?).

Karl

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 43
  • Comments: 74 (50 by maintainers)

Commits related to this issue

Most upvoted comments

if k8s supported container dependencies, it would be a lot easier…

I feel like envoy (or rather the whole proxy) should stay up until all connections it proxies are drained (or the terminationGrace expires). That’d automatically keep it up until the downstream application has shut down itself.

Also related #10112

We handle this in the most hacky way possible… Our istio-proxy has a preStop hook which fires this script:

#!/bin/bash
set -e
echo "Exit signal received, waiting for all network connections to clear..."
while [ $(netstat -plunt | grep tcp | grep -v envoy | wc -l | xargs) -ne 0 ]; do
  printf "."
  sleep 3;
done
echo "Network connections cleared, shutting down pilot..."
exit 0

Basically waits for any non envoy ports to stop listening before envoy itself shuts down.

Posting a solution that works for me. Adapted @Stono solution by draining manually inbound connections

        lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "curl -X POST localhost:15000/drain_listeners?inboundonly; while [ $(netstat -plunt | grep tcp | grep -v envoy | grep -v pilot-agent | wc -l | xargs) -ne 0 ]; do sleep 1; done"]          

I have some feedback to share when using a preStop hook with the following code:

preStop:
  exec:
    command:
    - s
    - -c
    - sleep 20; while [ $(netstat -plunt | grep tcp | grep -v envoy | wc -l | xargs) -ne 0 ]; do sleep 1; done

My k8s deployment has:

  • terminationGracePeriodSeconds set to 600s
  • readinessProbe set to some HTTP path

When deploying new pods, old pods receive the SIGTERM signal and start to shutdown gracefully. The terminationGracePeriodSeconds will keep the pods around in a Terminating state where we can see that the wrapped container has shutdown but not istio-proxy:

NAME↑          IMAGE                                       READY       STATE         RS PROBES(L:R)
httpbin        kennethreitz/httpbin                        false       Running       0 off:on      
istio-init     gke.gcr.io/istio/proxy_init:1.1.13-gke.0    true        Completed     0 off:off     
istio-proxy    gke.gcr.io/istio/proxyv2:1.1.13-gke.0       true        Running       0 off:on      

The wrapped container (httpbin here) is now shut down but the readiness probe setup will continue to send HTTP requests through the istio-proxy resulting in 503 response code:

image

I had enough time to ssh into the running istio-proxy and run its preStop cmd:

istio-proxy@httpbin-66b967d895-fl2df:/$ netstat -plunt | grep tcp
tcp        0      0 0.0.0.0:15090           0.0.0.0:*               LISTEN      19/envoy
tcp        0      0 127.0.0.1:15000         0.0.0.0:*               LISTEN      19/envoy
tcp        0      0 0.0.0.0:15001           0.0.0.0:*               LISTEN      19/envoy
tcp6       0      0 :::15020                :::*                    LISTEN      1/pilot-agent

Looks like pilot-agent connection will stay open forever which prevents the preStop hook configured on istio-proxy container to shut down properly the container.

I’m currently not sure what’s the best approach to solve this issue.

  1. When envoy starts there is a short delay until first configuration is been pushed from pilot.
  2. Once a pod has been terminated all its endpoints will removed. This will immediately trigger pilot to push new envoy configs.

So I think pod lifecycle have to change to this:

  • Init containers start
  • Init containers finish
  • Sidecars start
    • Envoy has to wait until first configuration is been arrived from pilot
  • Once all Sidecars have started: Containers start

and back

  • Sidecars has to be informed about the upcoming termination
    • Envoy has to freeze the configuration to prevent upcoming pushes from pilot caused by removed endpoints
  • Once all Sidecars are informed: Containers sent SIGTERM
  • Once all Containers have exited: Sidecars sent SIGTERM.

To be clear my original issue is not really an istio problem, more of a kubernetes container ordering problem that’s just exacerbated by istio. As two containers within the same pod will receive the SIGTERM at the same time, there is no logic around guaranteeing that pilot-agent doesn’t exit after your app (and stop accepting new outbound connections) which means that if your application has some shutdown logic which involves creating new connections, they can fail if pilot-agent has already exited (because the iptables rules will be routing it to nothing).

My work around is more of a kubernetes work around, than an istio one, in that i have a prestop hook block the SIGTERM from getting to pilot-agent until (via a relatively crude check) there are no other ports (owned by my application) listening in the pod - as a signal the app has exited. It is by no means perfect and designed to be a stop gap until a better way to handle the problem surfaces.

As a side note, the second a pod is terminated it enters the TERMINATING state which removes it from the service endpoints, and thus eventually consistent downstream envoys sending traffic to it.

If you’re getting 10 minutes of 503’s, there’s something here wrong here that is different to my original post, as that pod shouldn’t be getting any traffic at all.

Honestly, so long as pilot-agent and envoy drain their connections correctly (which I believe they do) and envoy will continue to accept new outbound connections during its shutdown phase (which i’m unsure about), there isn’t much more that istio can do.

Hi @Stono

Thanks you very much for your fix - it works as needed for us.

After reading the discussion I due however not fully understand why this shouldn’t be added as default to Istio. I can follow the arguments that it should be Kubernetes which should have hierarchy of it’s containers - however currently it doesn’t.

It is nice that the service mess as a complete abstraction from the application layer. But in cases I can think of I would never wont to have my connections forcefully closed for my applications at shutdown - which is what happen when the sidecar gets killed. Hence I actually think your fix puts me better of than having nothing. So why isn’t this general?

Maybe I miss something and if so I would learn a lot hearing your thoughts 😃

Thanks for the reply @chadlwilson . Right that makes more sense, I will just create a new issue 👍

Perhaps there is some missing information, but your issue sounds rather different than this described? This was about lack of graceful shutdown co-ordination between app container and proxy; causing issues on outgoing calls from app through proxy and which would naturally limit the errors to the k8s layer terminationGracePeriod - not 17 minutes? Side note, but outlier detection is not really intended as a mechanism to remove pods that are undergoing a shutdown so this seems a bit strange.

If a source pod is still routing requests to a target pod and proxy long after that pod has been destroyed (and its Endpoints removed), which is what it looks like based on your graph from the ingress gateway’s perspective it sounds like you have a different problem to me.

Nevertheless, this issue is rather old, and my understanding is that approaches to Envoy draining have changed in this time, so I’m not sure this is the right place for conversation. Perhaps you can raise a new issue and properly report your setup, Istio version, expected behaviour vs observed behaviour in the standard way.

Agreed, this issue has overstayed its welcome 😃.

Karl, do we already have separate issues tracking the specific problems you see?

I actually propose closing off this issue and creating new ones for the specific separate issues that people are reporting on here as it’s become a bit of a dumping ground for any old shut down/drain/eventual consistency issue 😂

It’s over a year old!

@nmittler drain_listeners was added by @ramaraochavali very recently (week or so maybe?). I think what we can probably do now is to use that draining rather than our original approach of hot restarting with an empty config. So we get SIGTERM, call /drain_listeners, wait X seconds, then exit.

If preStop doesn’t send any signal to the app, then the preStop hook would do that

We could also just ship a default preStop hook that does that and then not do anything within pilot-agent?

Would have to test all of this out though, this is just based on my understanding which may not be how it actually behaves

@andraxylia why was this moved to 1.4? As far as I know there are no short term fixes for this

This is nice. Although, we would need some extra work to support this with Istio. This would have to drive pilot-agent instead of Envoy in Istio cases and we don’t yet have an endpoint to enable this.