kubernetes: Pod looses network connection (connection refused errors) during graceful shutdown period

What happened?

Hi,

Our backend infrastructure uses Kubernetes pods to perform analysis of data. As the flow of data increases and decreases through the day, the Kubernetes auto scaler scales up and scales down the pods.

The analysis operation on a single workload can take up to six minutes. To enable successful completion of in flight scan operation, the pods

have a terminationGracePeriodSeconds set to 360 seconds

capture the SIGTERM event and prevent the pod from accepting new requests

        SpringApplication springApplication = new SpringApplication(ProductApplication.class);
        springApplication.addListeners(new GracefulShutdownListener());
        
        GracefulShutdownListener -> onApplicationEvent(ContextClosedEvent event) 
        1. Prevent new requests from being sent
        2. Sleep for 6 minutes

The issue is that thought the pods are waiting for 6 minutes to complete in flight operation, these operation fail in outward network communication. Thus the pod gets a Connection Refused error when communicating with SNS in the graceful shutdown phase.

Investigation reveal that pods are failing in external network communication with multiple applications during the ‘graceful shutdown’ phase.

I have gone through multiple tickets /documents in a similar area

One comment on the same area seems to be https://github.com/kubernetes/kubernetes/issues/86280#issuecomment-583173036

I am raising this ticket as we require the ability to create new external connections (SNS) during graceful shutdown.

Primarily I see in the document https://freecontent.manning.com/handling-client-requests-properly-with-kubernetes/, that a B flow is started on pod deletion, which removed pods from ip tables. It seems to me that this may result in the pod not being able to create new network connections. If this is the case we would need a mechanism to delay the B flow until the grace period is done.

I am not sure if sleeping in the pre-stop hook is the recommended mechanism to prevent pods from loosing network connection during the graceful shutdown phase.

What did you expect to happen?

Any Kubernetes pod during it’s graceful shutdown period should have unrestricted access to required resources (network) and the ability to create new connections. This ability need not be by default and could also be by setting a configuration.

How can we reproduce it (as minimally and precisely as possible)?

Start a pod which perpetually creates a new connection with and updates an external entity
Terminate the pod manually

Connection refused should be seen in attempts to create new connection Connection refused error may be seen for readiness probe.

We have not found this to always be reproducible in our test clusters, however can be seen very frequently in our clusters server continuous data.

Anything else we need to know?

The evidences that we can see indicating that pod is loosing ability to communicate with external components

Connection refused error to multiple components, during graceful shut down phase.
pod events display connectino refused event for readiness proble a minute after shutdown

38m         Normal    Killing             pod/scan-434f242-fr23e   Stopping container xyz
38m         Normal    Killing             pod/scan-434f242-fr23e   Stopping container pqr
38m         Normal    Killing             pod/scan-434f242-fr23e   Stopping container abc
38m         Warning   FailedPreStopHook   pod/s434f242-fr23e   Exec lifecycle hook ([]) for Container "abc" in Pod "scan-434f242-fr23e_dss(19451a4f-2w32-4221-we32-4a3b0b169a7c)" failed - error: command '' exited with 126: , message: "OCI runtime exec failed: exec failed: container_linux.go:370: starting container process caused: exec: \"\": executable file not found in $PATH: unknown\r\n"
37m         Warning   Unhealthy           pod/scan-434f242-fr23e   Readiness probe failed: Get http://10.111.2.71:8080/actuator/health/readiness: dial tcp 10.203.9.71:8080: connect: connection refused
33m         Warning   Unhealthy           pod/scan-434f242-fr23e   Readiness probe failed: Get http://10.111.2.71:4191/ready: dial tcp 10.203.9.71:4191: connect: connection refused
38m         Warning   Unhealthy           pod/scan-434f242-fr23e   Liveness probe failed: Get http://10.112.2.71:4191/live: dial tcp 10.111.2.71:4191: connect: connection refused
38m         Warning   Unhealthy           pod/scan-434f242-fr23e   Liveness probe failed: Get http://10.111.2.71:8080/actuator/health/liveness: dial tcp 10.203.9.71:8080: connect: connection refused```

This does not always reproduce.

### Kubernetes version

<details>

```console
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"}

Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.17-eks-087e67", GitCommit:"087e67e479962798594218dc6d99923f410c145e", GitTreeState:"clean", BuildDate:"2021-07-31T01:39:55Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

AWS

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"

Install tools

Container runtime (CRI) and and version (if applicable)

Docker container

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 3
Comments: 22 (10 by maintainers)

Most upvoted comments

Hi @nilesh-telang , we’ve resolved the issue on our cluster. We’re using Calico CNI, which had a bug in it. After upgrading our Calico version our terminating pods retained their network connectivity, though I see you’ve already ruled this out earlier.

matthewbyrne on Feb 15, 2022

Hi @matthewbyrne ,

Thanks for sharing these details.

Hi @thockin, at our end I think we may have found out the root cause of our issue. Our production systems use linkerd and we found that the graceful shutdown of linkerd may be causing the pod to loose connectivity to external components

The linkerd shutdown documentation mentions the following - https://linkerd.io/2.10/tasks/graceful-shutdown/

When Kubernetes begins to terminate a pod, it starts by sending all containers in that pod a TERM signal. When the Linkerd proxy sidecar receives this signal, it will immediately begin a graceful shutdown where it refuses all new requests and allows existing requests to complete before shutting down.

This means that if the pod’s main container attempts to make any new network calls after the proxy has received the TERM signal, those network calls will fail. This also has implications for clients of the terminating pod and for job resources.

We have made changes in production to delay the shutdown of the linkerd such that the linkerd container is alive for the duration of the main container’s graceful shutdown period. Based on the documentation, I believe this should address the issue.

To mitigate this, use the --wait-before-exit-seconds flag with linkerd inject to delay the linkerd proxy’s handling of the TERM signal for a given number of seconds using a preStop hook. This delay gives slow clients additional time to receive the endpoints update before beginning graceful shutdown. To achieve max benefit from the option, the main container should have its own preStop hook with the sleep command inside which has a smaller period than is set for the proxy sidecar. And none of them must be bigger than terminationGracePeriodSeconds configured for the entire pod.

I will update and mark this ticket resolved once the verification of the fix is done.

Thank you all for you help and inputs on this issue.

nilesh-telang on Feb 17, 2022

@nilesh-telang what CNI/network plugin are you using in the cluster?

Also, kubelet and docker logs showing the time sequence of when the container sandbox stop request begins and when it finally ends might help. There are also CRI operation timeouts that could be in play here, but kubelet and docker logs will help figure that out.

dcbw on Oct 28, 2021

Thank you @aojea again for the clarification and the prompt response.

I think your last response clarifies the root cause of what we are observing. You have mentioned that endpoints will be removed as soon as the pod disappears from the endpoint object, the behavior is that “existing” TCP connections to the Service are not cleared.

I believe that this explains what we are observing. During the pod shutdown, I believe the pod is removed from the endpoint object. At this point the pod does not have an active TCP connection with SNS, and thus the connection with SNS is not retained. The fact that pod is removed from the end point may be the reason, that pod is unable to make further communications with SNS.

Retaining the connection with SNS as a keep alive TCP connection may the required route for enabling SNS connections during the shutdown phase. I have also added code to create a new SNSClient when the failure is encountered, to verify if creating a new connection via a new client addresses the issue. I will be running the tests in the coming few days and will udpate the ticket with the results. This will verify the ability to create a new SNS connection during the grace period.

nilesh-telang on Oct 25, 2021