antrea: ServiceLoadBalancer + externalTrafficPolicy: Local = Connection Refused most of time

Describe the bug I have a 5 node cluster with a deployment with 2 replicas. The deployment uses the ServiceExternalIP feature of antrea. I can see that the service got an IP from the same node network

whoami-headless   LoadBalancer   10.239.37.140   10.1.2.220   80:31591/TCP   5s

But if I try to curl 10.1.2.220 it will work for some remote endpoints, but not for others. It works just fine if I set the externalTrafficPolicy=Cluster, but that way I will lose the client IP

To Reproduce Enable Service IP to any service with a single replica deployed and use the yamls at the end of this bug report.

Expected I suppose the loadblancer IP would be “acquired” by one of the nodes running the pod and it should work.

Actual behavior Most of times I get connection refused and in other just works. From the masters it works using the clusterIP but it fails using the LoadBalancer IP.

# curl 10.239.37.140 -I
HTTP/1.1 200 OK
Date: Fri, 13 May 2022 17:35:05 GMT
Content-Length: 203
Content-Type: text/plain; charset=utf-8

# curl  10.1.2.220 -I
curl: (7) Failed to connect to 10.199.0.220 port 80: Connection refused

Versions:

  • Antrea version (Docker image tag). 1.6.1
  • Kubernetes version (use kubectl version). 1.22.8
  • Container runtime: cri-o
  • Linux kernel version on the Kubernetes Nodes (uname -r). 4.18.0-348.23.1.el8_5.x86_64
apiVersion: crd.antrea.io/v1alpha2
kind: ExternalIPPool
metadata:
  name: service-external-ip-pool
spec:
  ipRanges:
  - start: 10.1.1.220 
    end: 10.1.2.250 
  nodeSelector: {}
---
apiVersion: v1
kind: Service
metadata:
  name: whoami-headless
  annotations:
    service.antrea.io/external-ip-pool: "service-external-ip-pool"
spec:
  type: LoadBalancer 
  externalTrafficPolicy: Local
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  selector:
    app: whoami

Which node does get the LoadBalancer IP assigned to? I can see the LoadBalancer IP assigned to the kube-ipvs0 interface on all servers, but I suppose only one is really using, otherwise it would be an IP conflict situation, wouldn’t it ? I will check metallb to see If I get the same behaviour or not.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 30 (16 by maintainers)

Most upvoted comments

Good to know you plan to use the feature in production! We can definitely prioritize the Egress fix. I created an issue to track that: #3804

I’ll reopen this issue since there is a documentation change required. And I will assign @jianjuns since he volunteered 😃

@jianjuns Antrea’s implementation does respect externalTrafficPolicy: https://github.com/antrea-io/antrea/blob/2526b1f8c9f3b70933ef96dab89e034db660574d/pkg/agent/controller/serviceexternalip/controller.go#L363-L371 I think @antoninbas is correct. The access is supposed to fail if the traffic towards a service with Local externalTrafficPolicy reaches a Node that doesn’t have any backends of the service. I haven’t tried but I suppose MetalLB is same.