kubernetes: GCE Cloud: Service (LoadBalancer) networking breaks when pod is restarted
Kubernetes version (use kubectl version):
Server Version: version.Info{
Major:"1", Minor:"4",
GitVersion:"v1.4.7",
GitCommit:"92b4f971662de9d8770f8dcd2ee01ec226a6f6c0",
GitTreeState:"clean",
BuildDate:"2016-12-10T04:43:42Z",
GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"
}
Environment:
- Cloud provider or hardware configuration: Google Cloud
- OS (e.g. from /etc/os-release):
BUILD_ID=8820.0.0
NAME="Container-VM Image"
GOOGLE_CRASH_ID=Lakitu
VERSION_ID=55
BUG_REPORT_URL=https://crbug.com/new
PRETTY_NAME="Google Container-VM Image"
VERSION=55
GOOGLE_METRICS_PRODUCT_ID=26
HOME_URL="https://cloud.google.com/compute/docs/containers/vm-image/"
ID=gci
- Kernel (e.g.
uname -a):
Linux gke-bernie-cluster-default-pool-b7500cdf-3kr1 4.4.14+ #1 SMP Tue Sep 20 10:32:07 PDT 2016 x86_64 Intel(R) Xeo
n(R) CPU @ 2.50GHz GenuineIntel GNU/Linux
- Install tools: none
- Others: none
What happened: We have a replication controller that is responsible for our external REST API. The pod is exposed via a service. When k8s restarts the container, the service fails to redirect traffic to the restarted container. If all pods for an rc fail, the API is no longer externally accessible.
What you expected to happen: The service should automatically redirect traffic to a restarted pod and the process should be seamless.
How to reproduce it (as minimally and precisely as possible): Create a replication controller with one pod that serves as an external REST API. Expose all pods via a service. Force kubernetes to restart the pod. The API will no longer be exposed, even though logs show the pod functioning normally, and networking is broken.
Anything else do we need to know: Here’s the configuration for the service that links to the replication controller:

About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 1
- Comments: 51 (22 by maintainers)
Okay, we just experienced deployment-wide pod failure and I can demonstrate the “no logging” scenario. Below, you can see that all pods across a deployment were suddenly restarted:
Every single one of these containers displays the same message:
Once I delete and redeploy all pods, it functions normally again.