kubernetes: Specifying idle timeout on ELB-backed service publicly exposes an internal ELB

/kind bug

What happened:

When using AWS and a Service of type LoadBalancer:

If the annotation service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout is added to a Service that has been designated to use an internal-only ELB (via service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0" annotation), the ELB will be created as a public-facing ELB.

This is a potential security risk, as it can expose an internal-only service to the public internet, potentially without any network ACLs.

Furthermore, the external Route53 record as specified by the dns.alpha.kubernetes.io/external annotations are not created when this idle timeout annotation is added.

What you expected to happen:

I expected the idle timeout to be applied to an internal-facing ELB.

How to reproduce it (as minimally and precisely as possible):

Create a service definition like this:

apiVersion: v1
kind: Service
metadata:
  labels:
    application: proxy
  name: proxy
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
    dns.alpha.kubernetes.io/external: proxy.mycompany.com
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 80
    targetPort: 80
    protocol: TCP
  selector:
    application: proxy
% kubectl apply -f service.yaml

Wait for the service to finish provisioning the ELB and then observe that the internal-facing service is configured properly:

% kubectl describe svc proxy

Name:                     proxy
Namespace:                default
Labels:                   application=proxy
Annotations:              dns.alpha.kubernetes.io/external=proxy.mycompany.com
                          kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"dns.alpha.kubernetes.io/external":"proxy.kube.webflow.services","service.beta.kuber...
                          service.beta.kubernetes.io/aws-load-balancer-backend-protocol=http
                          service.beta.kubernetes.io/aws-load-balancer-internal=0.0.0.0/0
Selector:                 application=proxy
Type:                     LoadBalancer
IP:                       100.68.210.2
LoadBalancer Ingress:     internal-aa1111111111111111111111111-222222222.us-east-1.elb.amazonaws.com
Port:                     http  80/TCP
TargetPort:               80/TCP
NodePort:                 http  32206/TCP
Endpoints:                100.96.249.27:80,100.97.79.3:80
Session Affinity:         None
External Traffic Policy:  Cluster

Now, add the service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout annotation to your service so that your definition looks like this:

apiVersion: v1
kind: Service
metadata:
  labels:
    application: proxy
  name: proxy
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: "0.0.0.0/0"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
    dns.alpha.kubernetes.io/external: proxy.mycompany.com
    service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 10
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 80
    targetPort: 80
    protocol: TCP
  selector:
    application: proxy

Delete your old service resource:

% kubectl delete svc proxy

…and give Kube/AWS a few moments to remove the ELB…

Then recreate the new service:

% kubectl apply -f service.yaml

…and wait a couple of minutes for the service and ELB to be provisioned…

Observe that the service is not using an internal-facing ELB:

Name:                     proxy
Namespace:                default
Labels:                   application=proxy
Annotations:              kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"application":"proxy"},"name":"proxy","namespace":"default"},"spec":...
Selector:                 application=proxy
Type:                     LoadBalancer
IP:                       100.67.65.8
LoadBalancer Ingress:     a11111111111111111111111111-22222222.us-east-1.elb.amazonaws.com
Port:                     http  80/TCP
TargetPort:               80/TCP
NodePort:                 http  30087/TCP
Endpoints:                100.96.249.27:80,100.97.79.3:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------
  Normal  EnsuringLoadBalancer  15s   service-controller  Ensuring load balancer
  Normal  EnsuredLoadBalancer   13s   service-controller  Ensured load balancer

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:13:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:00:59Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: AWS

  • OS (e.g. from /etc/os-release): Container Linux by CoreOS stable (1800.7.0)

  • Kernel (e.g. uname -a): Linux ip-10-27-61-32.ec2.internal 4.14.63-coreos #1 SMP Wed Aug 15 22:26:16 UTC 2018 x86_64 Intel® Xeon® Platinum 8175M CPU @ 2.50GHz GenuineIntel GNU/Linux

  • Install tools: kops 1.10.0-alpha.1

  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 6
  • Comments: 34 (3 by maintainers)

Most upvoted comments

@chrissnell This happened to me too. It is because the idle timeout has the wrong type. You need to quote it and it will work as expected. So instead of: service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 10 Use: service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: “10”

/remove-lifecycle stale

I spent more then 1 year on the solution of this bug. 😫 It was crucial option for us As we have websocket protocol and an average live of the connections is more then 10 mins We actually were forced to implement the additional fail handling logic on top of it.

I can confirm that it behave dangerously and unpedictiosly 😞

And this issue should have the high priority

I think the problem is that the annotations were dropped. From your second kubectl get svc output:

Annotations:              kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"application":"proxy"},"name":"proxy","namespace":"default"},"spec":...

I agree this is a problem, but I think kubectl is the problem here.

/sig cli