metallb: MetalLB cannot peer with BGP routers that Calico is already peering with

Is this a bug report or a feature request?:

Question actually.

What happened: Can’t get MetalLB to peer with my core router.

What you expected to happen: Peering is expected to happen and for me to see the routes for each node in the routing table.

How to reproduce it (as minimally and precisely as possible): Setup MetalLB and peer it with a Cisco L3 routing device.

Anything else we need to know?: I am not sure if this is something related to the Cisco side or the MetalLB side. I also have calico peering with the same Cisco device with the same IP address and that could be the problem, but I wanted to verify. I am not sure that it is a bug.

Getting this in the log: {“log”:“E1213 22:09:35.960710 1 bgp.go:48] read OPEN from "10.1.105.1:179": message type is not OPEN, got 3, want 1\n”,“stream”:“stderr”,“time”:“2017-12-13T22:09:35.961076973Z”}

Makes me think the connection to calico is hijacking the metallb connection.

Environment:

  • MetalLB version: Not sure how to find this.
  • Kubernetes version: Server Version: version.Info{Major:“1”, Minor:“8”, GitVersion:“v1.8.5”
  • BGP router type/version: Cisco 4500 Version 03.09.00.E
  • OS (e.g. from /etc/os-release): “16.04.3 LTS (Xenial Xerus)”
  • Kernel (e.g. uname -a): Linux 4.4.0-103-generic

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 80 (24 by maintainers)

Commits related to this issue

Most upvoted comments

Provide some setup guide for this please!

@logan2211can you, please, provide an example of MetalLb bgp config that works with your calico version (BGP_COMMUNITY=100)?

Sure! Check out https://gist.github.com/logan2211/bd0d8c3fd9091ddb207cde57e16d73a4

Another update here - in Calico v3.18, Calico will be capable of advertising LoadBalancer IPs allocated by the MetalLB controller without installing Speaker. https://github.com/projectcalico/confd/pull/422

Ok, I can confirm that the override above works without any issues.

All you need to do mount in that modified config, and then create a disabled IPPool in calico to get it to re-advertise the prefixes that metallb is pumping in.

cat << EOF | calicoctl create -f -
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: metallb-ip-pool
spec:
  cidr: 10.110.0.0/16
  disabled: true
EOF

Then configure your Calico Peering:

cat << EOF | calicoctl create -f -
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: metallb
spec:
  peerIP: 127.0.0.1
  asNumber: 65480
EOF

and Metallb (in this case, calico has an ASN of 65479)

    - peer-address: 127.0.0.1
      peer-asn: 65479
      my-asn: 65480

Note: This needs to be an eBGP session, I got a load of errors if the ASNs are the same.

@danderson if you like I can write this up as a docs PR for you if you like?

For people wanting the exact code to mount the modified bird.cfg.template into calico (I haven’t tested this but it should work).

Use this in your Calico manifest to inject the new bird template

containers:
- volumeMounts:
  - name: bird-template
    mountPath: /etc/calico/confd/templates/bird.cfg.template
    subPath: bird.cfg.template
volumes:
- name: bird-template
  configMap:
    name: calico-metallb-config

And here’s an example of the v3.0.2 calico template with the patch

# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
apiVersion: v1
metadata:
  name: calico-metallb-config
data:
  bird.cfg.template: |-
    # Generated by confd
    include "bird_aggr.cfg";
    include "bird_ipam.cfg";
    {{$node_ip_key := printf "/host/%s/ip_addr_v4" (getenv "NODENAME")}}{{$node_ip := getv $node_ip_key}}

    router id {{$node_ip}};

    {{define "LOGGING"}}
    {{$node_logging_key := printf "/host/%s/loglevel" (getenv "NODENAME")}}{{if exists $node_logging_key}}{{$logging := getv $node_logging_key}}
    {{if eq $logging "debug"}}  debug all;{{else if ne $logging "none"}}  debug { states };{{end}}
    {{else if exists "/global/loglevel"}}{{$logging := getv "/global/loglevel"}}
    {{if eq $logging "debug"}}  debug all;{{else if ne $logging "none"}}  debug { states };{{end}}
    {{else}}  debug { states };{{end}}
    {{end}}

    # Configure synchronization between routing tables and kernel.
    protocol kernel {
      learn;             # Learn all alien routes from the kernel
      persist;           # Don't remove routes on bird shutdown
      scan time 2;       # Scan kernel routing table every 2 seconds
      import all;
      export filter calico_ipip; # Default is export none
      graceful restart;  # Turn on graceful restart to reduce potential flaps in
                         # routes when reloading BIRD configuration.  With a full
                         # automatic mesh, there is no way to prevent BGP from
                         # flapping since multiple nodes update their BGP
                         # configuration at the same time, GR is not guaranteed to
                         # work correctly in this scenario.
    }

    # Watch interface up/down events.
    protocol device {
      {{template "LOGGING"}}
      scan time 2;    # Scan interfaces every 2 seconds
    }

    protocol direct {
      {{template "LOGGING"}}
      interface -"cali*", "*"; # Exclude cali* but include everything else.
    }

    {{$node_as_key := printf "/host/%s/as_num" (getenv "NODENAME")}}
    # Template for all BGP clients
    template bgp bgp_template {
      {{template "LOGGING"}}
      description "Connection to BGP peer";
      local as {{if exists $node_as_key}}{{getv $node_as_key}}{{else}}{{getv "/global/as_num"}}{{end}};
      multihop;
      gateway recursive; # This should be the default, but just in case.
      import all;        # Import all routes, since we don't know what the upstream
                         # topology is and therefore have to trust the ToR/RR.
      export filter calico_pools;  # Only want to export routes for workloads.
      next hop self;     # Disable next hop processing and always advertise our
                         # local address as nexthop
      source address {{$node_ip}};  # The local address we use for the TCP connection
      add paths on;
      graceful restart;  # See comment in kernel section about graceful restart.
    }

    # ------------- Node-to-node mesh -------------
    {{if (json (getv "/global/node_mesh")).enabled}}
    {{range $host := lsdir "/host"}}
    {{$onode_as_key := printf "/host/%s/as_num" .}}
    {{$onode_ip_key := printf "/host/%s/ip_addr_v4" .}}{{if exists $onode_ip_key}}{{$onode_ip := getv $onode_ip_key}}
    {{$nums := split $onode_ip "."}}{{$id := join $nums "_"}}
    # For peer {{$onode_ip_key}}
    {{if eq $onode_ip ($node_ip) }}# Skipping ourselves ({{$node_ip}})
    {{else if ne "" $onode_ip}}protocol bgp Mesh_{{$id}} from bgp_template {
      neighbor {{$onode_ip}} as {{if exists $onode_as_key}}{{getv $onode_as_key}}{{else}}{{getv "/global/as_num"}}{{end}};
    }{{end}}{{end}}{{end}}
    {{else}}
    # Node-to-node mesh disabled
    {{end}}


    # ------------- Global peers -------------
    {{if ls "/global/peer_v4"}}
    {{range gets "/global/peer_v4/*"}}{{$data := json .Value}}
    {{$nums := split $data.ip "."}}{{$id := join $nums "_"}}
    # For peer {{.Key}}
    protocol bgp Global_{{$id}} from bgp_template {
      {{if eq $data.ip ("127.0.0.1")}}passive on; # Don't talk to yourself{{end}}
      neighbor {{$data.ip}} as {{$data.as_num}};
    }
    {{end}}
    {{else}}# No global peers configured.{{end}}


    # ------------- Node-specific peers -------------
    {{$node_peers_key := printf "/host/%s/peer_v4" (getenv "NODENAME")}}
    {{if ls $node_peers_key}}
    {{range gets (printf "%s/*" $node_peers_key)}}{{$data := json .Value}}
    {{$nums := split $data.ip "."}}{{$id := join $nums "_"}}
    # For peer {{.Key}}
    protocol bgp Node_{{$id}} from bgp_template {
      neighbor {{$data.ip}} as {{$data.as_num}};
    }
    {{end}}
    {{else}}# No node-specific peers configured.{{end}}

It’s working for me now, with Calico version 3.0.1.

I’ve hacked the /etc/calico/confd/templates/bird.cfg.template and altered the “Global peers” section, so BIRD is passive when the peer is 127.0.0.1. No ideal, but it’s working. I did this by just mounting my change into the Pod, I don’t want to rebuild Calico now.

    # ------------- Global peers -------------
    [..]
    protocol bgp Global_{{$id}} from bgp_template {
      {{if eq $data.ip ("127.0.0.1")}}passive on; # Don't talk to yourself{{end}}
      neighbor {{$data.ip}} as {{$data.as_num}};
    }
    [..]

I added the MetalLB IP pool to calico (Create it with calicoctl). Then these IPs won’t get filtered out. I don’t yet know, what https://github.com/projectcalico/calico/issues/1604 would change here.

apiVersion: projectcalico.org/v3
kind: IPPoolList
items:
- apiVersion: projectcalico.org/v3
  kind: IPPool
  metadata:
    name: metallb-ip-pool
  spec:
    cidr: 192.168.10.0/24
    ipipMode: Always
    natOutgoing: false

When I now create a local peer for Calico on each Node, MetalLB is happily connecting to it. It seems to work without a unique AS per Node or MetalLB instance.

On the Calico side:

apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: bgppeer-metallb
spec:
  asNumber: 64631
  peerIP: 127.0.0.1

and on the MetalLB side:

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    peers:
    - peer-address: 127.0.0.1
      peer-asn: 64622
      my-asn: 64631
    address-pools:
    - name: default
      protocol: bgp
      cidr:
      - 192.168.10.0/24

The setup is very similar to the Romana setup then.

@falfaro This isn’t something that I’ve been actively working on recently. I think Calico is missing the ability to set passive on in the BGP peer resource.

This is the issue tracking that: https://github.com/projectcalico/calico/issues/1603 (there’s a comment explaining the work that needs to be done, and even a half-finished PR that has since been closed). It should be fairly straightforward to do, if anyone has the appetite.

Very happy to hear this @gautvenk. Thank you @caseydavenport for pushing this over the line.

@adamdunstan Sounds good, FWIW the original thought actually came from this comment here regarding kube-router and MetalLB https://github.com/metallb/metallb/issues/160#issuecomment-446708166.

@Elegant996 As I guessed. With the caveat that I havent looked at how calico is configuring bird,… I assume that its importing the interfaces from ipvs0, and advertising those routes. If I am correct you will be getting all the routes to all of the addresses that Ipvs has attached. This may be what you want, but this will include endpoints and kubeapi. You may want to modify the bird configuration (I think its in a configmap) to filter only the external addresses, and it would then advertise thoses. Make sure that ipvs is configured with the strict_arp flag. Bit of a confusing name, just means that the IPVS interface should not answer arp requests, otherwise every node will answer for those addresses locally, which doesnt really matter for routed destinations but could cause you some confusion later on… Hope I have been helpful…

Not quite, I found out that the LB should be used for service health checks as External IPs do not perform them. Instead it just throws you at one of the endpoints without checking.

That being said, when I read this over I realized something. The point of the MetalLB controller is to create the LoadBalancer IP. So if we can route to the service through an External IP, then the controller can route to it as well making it accessible through the LoadBalancer IP.

Give this a go, install MetalLB but completely remove the speaker daemonset. You should still be able to route your ingress controller on the LoadBalancer IP without issue.

Forget BGP and with MetalLB, let Calico handle it and just use MetalLB for your LoadBalancer IP 😃 This completely removes the need for peering and accomplishes the same goals. Perhaps this is sufficient to close the issue?

For now I built custom templates with a number of fixes that make calico-node operate more smoothly in a bare metal environment with MetalLB.

Patched calico-node images are available at https://github.com/logan2211/calico-node-baremetal/tree/v3.12.0

The kustomize present in the repo allows you to announce IPs from MetalLB using community string <calico asn>:100 that will push routes through from MetalLB to your node/global calico peers without requiring creation of calico ippool resources. MetalLB community configuration is documented here.

Interesting!

First off, I haven’t tried this configuration myself in about a year. I can’t think of anything that would change the behavior significantly, but that’s a point to consider.

The repro steps you described are pretty much exactly what I did. I will note that peering was intermittently successful when I tried: Calico was actively trying to connect to localhost (i.e. itself) and then disconnecting when it correctly detected the loop. If MetalLB slipped in at the right time between attempts and at the right spot in the backoff cycle, it was able to peer and break the cycle. However, it was a rare occurence on my 3-node cluster, meaning that at best 1 in 3 nodes successfully peered, and the others started spinning on hitting the backoff timers on the calico side.

One possible change to your repro to “help” with this: after Calico is configured with a localhost BGPPeer, let it sit like that for ~5min so that it can attempt+fail to connect a bunch of times. Then configure MetalLB and see if the peering works out.

If that doesn’t help, once MetalLB is Established, try removing the MetalLB speaker DaemonSet and recreating it after 5min. In both cases, the thing I’m trying to do is to give calico’s BGPd enough time to get into a backed-off state trying to reestablish the session, if that makes sense?

Your hypothesis on source IPs might be implicated… Specifically, you say you select the primary node IP as the BGP source address. Do you also configure Calico to only listen on specific addresses? Possibly the change could be that Calico is no longer listening on 127.0.0.1:179, so when MetalLB isn’t around the failure mode is nice and clean (connection refused, not “hey you have a BGP loop misconfiguration”).

Those are my thoughts so far. I’m travelling for work this week and next with minimal time to dig further, but when I get back I can try setting up a test case in virtuakube (metallb’s shiny new e2e testing setup) and see if I can give you a crisp reproduction recipe. I already have k8s+calico in the test matrix, it’s just a question of adding a test that exercises this peering config.

Another thought that I think I’ve voice in the past: if the Calico node agent gave me a way to peer with it over localhost and inject routes, that would be just a documentation change to MetalLB

So, @danderson, I was working with @stevegaossou today trying to reproduce the original issues you hit while attempting this, and we didn’t see the same behaviors mentioned - Calico trying to peer with itself and hitting the backoff. In fact, we saw Calico peer with MetalLB seamlessly over 127.0.0.1 without needing passive set at all on the peering.

I’m a bit suspicious of the result, since clearly lots of folks have hit some variation of this, but wondering if there might be something we did differently that made it work… The main thought I had is that Calico is selecting its BGP source address as the primary IP of the node, which is different from the configured MetalLB peer on localhost.

The steps we took were dead simple, much like others have described above but without needing passive set. We started with the MetalLB minikube example and then:

  • Configured Calico to peer with MetalLB over localhost using a BGPPeer
  • Configured MetalLB to peer with Calico over localhost
  • BGP session goes to Established
  • Configured an IPPool in Calico with the IP range given to MetalLB (to enable route export).

Do you recall what differences might have existed in your test environment that might have caused this difference in result?

Thanks for thinking about more solutions! It’s definitely appreciated.

So far, I’m optimistic that the things we need will get implemented by the Calico folks, they seemed receptive to the bugs and I’ve heard that it’s on the plate for the next release.

I will investigate in more detail later today after work, but after a very quick reading of calico’s documentation, one potential hack suggests itself…

Calico supports per-node BGP peering configurations. Assuming we can get the configuration to be acceptable in terms of the BGP spec, we could make MetalLB listen for BGP on a static host port (not 179), and create per-node BGP peerings in Calico to peer with MetalLB. Basically, make calico’s bgpd on each node peer with localhost:1234, so that MetalLB can inject its routes into Calico that way.

Open questions:

  • BGP peering with localhost is notoriously tricky, because the router can incorrectly believe that it’s peering with itself. Calico uses the GoBGP codebase, which should not have this problem… But it needs to be tested.
  • Does the Calico node daemon run the full BGP convergence algorithm? IOW, if we have a peering chain of MetalLB<>Calico<>external router, will Calico propagate the routes from MetalLB to the external router? In theory it should, because Calico advertises itself as “we just make your k8s nodes look like a regular BGP router”, so unlike MetalLB it should be implementing the full BGP convergence/redistribution algorithm.
  • Can we make this configuration automagic? Given appropriate RBAC rules, we could give MetalLB the permission to create new calico bgpPeer objects, so some component of MetalLB could automatically reconfigure the cluster to peer with MetalLB. This adds a bunch of complexity, and the “magic” may not be welcome by cluster/network admins who want explicit control over what happens to their network.
  • Alternatively, is it possible to define the MetalLB peering as a “global” Calico BGP peer, with a peer address of 127.0.0.1? Again, in theory, this should just work: global peer will apply to all nodes, and all nodes will just connect to a different MetalLB instance on each machine. That should be fine… But maybe Calico has some sanity checks that prevent this. This would be much nicer from an admin perspective, because we can just tell Calico cluster operators “here’s one BGP peer object that you should add to your Calico config, and voila, MetalLB just works!”