kubernetes: kube-proxy randomly returns 504 gateway timeouts (without actually waiting)

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.): Yes and I have

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): kube-proxy, gateway timeout, gateway, 504


Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-22T10:12:27Z", GoVersion:"go1.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): kope.io/k8s-1.4-debian-jessie-amd64-hvm-ebs-2016-10-21
  • Kernel (e.g. uname -a):
  • Install tools: kops

What happened: I have setup a service with LoadBalancer (AWS ELB). The ELB sees all nodes as available. When trying to access that service, I randomly get 504 gateway timeouts that appear instantly, without actually waiting for a timeout. Restarting the kube-proxy does not seem to help at all. Refreshing/re-sending the request will solve it. It seems to happen every few requests as if on a round robin rotation.

What you expected to happen: kube-proxy should be able to handle such things.

How to reproduce it (as minimally and precisely as possible): Not sure here. Try running a kops cluster with an ELB service and a uWSGI web server as a backend

Anything else we need to know: We are using uWSGI as a server inside the containers.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 26 (13 by maintainers)

Most upvoted comments

@ikornaselur I was accidentally using http-socket instead of http (for 3 years now). Also added the harakiri setting and adjusted threads/processes a bit.

One thing to keep in mind with ELBs is that they have HTTP keep alive (called idle timeout) on by default, and its set to 60seconds. That needs to be 1 second less than your apps keep alive. This was the cause of 504s for several of our apps.

http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html

To configure the ELB idle time-out you can set this annotation of your service.

// ServiceAnnotationLoadBalancerConnectionIdleTimeout is the annotation used
// on the service to specify the idle connection timeout.
const ServiceAnnotationLoadBalancerConnectionIdleTimeout = "service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout"