kubernetes: kube-proxy randomly returns 504 gateway timeouts (without actually waiting)

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.): Yes and I have

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): kube-proxy, gateway timeout, gateway, 504

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-22T10:12:27Z", GoVersion:"go1.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): kope.io/k8s-1.4-debian-jessie-amd64-hvm-ebs-2016-10-21
Kernel (e.g. uname -a):
Install tools: kops

What happened: I have setup a service with LoadBalancer (AWS ELB). The ELB sees all nodes as available. When trying to access that service, I randomly get 504 gateway timeouts that appear instantly, without actually waiting for a timeout. Restarting the kube-proxy does not seem to help at all. Refreshing/re-sending the request will solve it. It seems to happen every few requests as if on a round robin rotation.

What you expected to happen: kube-proxy should be able to handle such things.

How to reproduce it (as minimally and precisely as possible): Not sure here. Try running a kops cluster with an ELB service and a uWSGI web server as a backend

Anything else we need to know: We are using uWSGI as a server inside the containers.

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 26 (13 by maintainers)

Most upvoted comments

@ikornaselur I was accidentally using http-socket instead of http (for 3 years now). Also added the harakiri setting and adjusted threads/processes a bit.

amigold on Aug 30, 2017

One thing to keep in mind with ELBs is that they have HTTP keep alive (called idle timeout) on by default, and its set to 60seconds. That needs to be 1 second less than your apps keep alive. This was the cause of 504s for several of our apps.

http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html

To configure the ELB idle time-out you can set this annotation of your service.

// ServiceAnnotationLoadBalancerConnectionIdleTimeout is the annotation used
// on the service to specify the idle connection timeout.
const ServiceAnnotationLoadBalancerConnectionIdleTimeout = "service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout"

nbutton23 on Jul 6, 2017