cilium: Inconsistent behavior related to VRRP packets drops when host firewall is enabled in v1.11 vs v1.10

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

I’m running keepalived on two of my hosts and they are both reporting a timeout. The VIP is 192.168.33.200. Running ip ad | grep 200 on the two nodes:

node3 | CHANGED | rc=0 >>
    inet 192.168.33.200/32 scope global eth1
node2 | CHANGED | rc=0 >>
    inet 192.168.33.200/32 scope global eth1

Looking at hubble observe -t drop I see the following:

Dec 30 21:01:55.186: 192.168.33.20 <> 192.168.33.30 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:55.808: 192.168.33.30 <> 192.168.33.20 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:56.186: 192.168.33.20 <> 192.168.33.30 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:56.415: fe:54:00:7f:be:9e <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:56.472: fe:54:00:34:d6:c7 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:56.702: fe:54:00:e3:4a:21 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:56.808: 192.168.33.30 <> 192.168.33.20 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:56.829: fe:54:00:4b:40:45 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:56.861: fe:54:00:1e:82:f6 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:57.140: fe:54:00:2c:b7:f6 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:57.186: 192.168.33.20 <> 192.168.33.30 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:57.808: 192.168.33.30 <> 192.168.33.20 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:58.186: 192.168.33.20 <> 192.168.33.30 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:58.456: fe:54:00:34:d6:c7 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:58.686: fe:54:00:e3:4a:21 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)
Dec 30 21:01:58.808: 192.168.33.30 <> 192.168.33.20 CT: Unknown L4 protocol DROPPED (IPv4)
Dec 30 21:01:59.124: fe:54:00:2c:b7:f6 <> 01:80:c2:00:00:00 Unsupported L2 protocol DROPPED (Ethernet)

JSON output:

{"time":"2021-12-30T21:06:17.103685380Z","verdict":"DROPPED","drop_reason":166,"ethernet":{"source":"fe:54:00:2c:b7:f6","destination":"01:80:c2:00:00:00"},"source":{"identity":1,"labels":["reserved:host"]},"destination":{"identity":2,"labels":["reserved:world"]},"Type":"L3_L4","node_name":"node2","event_type":{"type":1,"sub_type":166},"traffic_direction":"INGRESS","drop_reason_desc":"UNSUPPORTED_L2_PROTOCOL","Summary":"Ethernet"}
{"time":"2021-12-30T21:06:17.210109754Z","verdict":"DROPPED","drop_reason":137,"ethernet":{"source":"52:54:00:2c:b7:f6","destination":"52:54:00:4b:40:45"},"IP":{"source":"192.168.33.20","destination":"192.168.33.30","ipVersion":"IPv4"},"source":{"identity":1,"labels":["reserved:host"]},"destination":{"identity":6,"labels":["reserved:remote-node"]},"Type":"L3_L4","node_name":"node2","event_type":{"type":1,"sub_type":137},"traffic_direction":"INGRESS","drop_reason_desc":"CT_UNKNOWN_L4_PROTOCOL","Summary":"IPv4"}

The two nodes IPs are 192.168.33.20 and 192.168.33.30.

I’ve tested with both 1.11 and 1.10 with exactly identical machines(except for Cilium) and the issue only happens on 1.11

Cilium Version

Defaulted container "cilium-agent" out of: cilium-agent, clean-cilium-state (init)
Client: 1.11.0 27e0848 2021-12-05T15:34:41-08:00 go version go1.17.3 linux/amd64
Daemon: 1.11.0 27e0848 2021-12-05T15:34:41-08:00 go version go1.17.3 linux/amd64

Kernel Version

Linux node1 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Tue Dec 21 19:02:23 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:09:57Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}

Sysdump

https://file.io/Aai7Vju4o9I3

Relevant log output

No response

Anything else?

cilium status --verbose:

KVStore:                Ok   Disabled
Kubernetes:             Ok   1.23 (v1.23.0) [linux/amd64]
Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   Strict    [eth1 192.168.33.20 (Direct Routing)]
Host firewall:          Enabled   [eth1]
Cilium:                 Ok   1.11.0 (v1.11.0-27e0848)
NodeMonitor:            Listening for events on 2 CPUs with 64x4096 of shared memory
Cilium health daemon:   Ok   
IPAM:                   IPv4: 3/254 allocated from 172.16.1.0/24, 
Allocated addresses:
  172.16.1.134 (router)
  172.16.1.196 (ingress-nginx/ingress-nginx-controller-njn5d)
  172.16.1.26 (health)
BandwidthManager:       Disabled
Host Routing:           Legacy
Masquerading:           IPTables [IPv4: Enabled, IPv6: Disabled]
Clock Source for BPF:   ktime
Controller Status:      24/24 healthy
  Name                                  Last success   Last error     Count   Message
  cilium-health-ep                      16s ago        never          0       no error   
  dns-garbage-collector-job             27s ago        never          0       no error   
  endpoint-2534-regeneration-recovery   never          never          0       no error   
  endpoint-3923-regeneration-recovery   never          never          0       no error   
  endpoint-4044-regeneration-recovery   never          never          0       no error   
  endpoint-gc                           27s ago        never          0       no error   
  ipcache-inject-labels                 2h35m8s ago    2h35m25s ago   0       no error   
  k8s-heartbeat                         26s ago        never          0       no error   
  mark-k8s-node-as-available            2h35m17s ago   never          0       no error   
  metricsmap-bpf-prom-sync              5s ago         never          0       no error   
  neighbor-table-refresh                17s ago        never          0       no error   
  resolve-identity-2534                 16s ago        never          0       no error   
  resolve-identity-3923                 1m33s ago      never          0       no error   
  resolve-identity-4044                 17s ago        never          0       no error   
  sync-endpoints-and-host-ips           17s ago        never          0       no error   
  sync-lb-maps-with-k8s-services        2h35m17s ago   never          0       no error   
  sync-policymap-2534                   14s ago        never          0       no error   
  sync-policymap-3923                   14s ago        never          0       no error   
  sync-policymap-4044                   14s ago        never          0       no error   
  sync-to-k8s-ciliumendpoint (2534)     6s ago         never          0       no error   
  sync-to-k8s-ciliumendpoint (3923)     3s ago         never          0       no error   
  sync-to-k8s-ciliumendpoint (4044)     7s ago         never          0       no error   
  template-dir-watcher                  never          never          0       no error   
  update-k8s-node-annotations           2h35m25s ago   never          0       no error   
Proxy Status:   OK, ip 172.16.1.134, 0 redirects active on ports 10000-20000
Hubble:         Ok   Current/Max Flows: 4095/4095 (100.00%), Flows/s: 3.61   Metrics: Disabled
KubeProxyReplacement Details:
  Status:                 Strict
  Socket LB Protocols:    TCP, UDP
  Devices:                eth1 192.168.33.20 (Direct Routing)
  Mode:                   SNAT
  Backend Selection:      Random
  Session Affinity:       Enabled
  Graceful Termination:   Enabled
  XDP Acceleration:       Disabled
  Services:
  - ClusterIP:      Enabled
  - NodePort:       Enabled (Range: 30000-32767) 
  - LoadBalancer:   Enabled 
  - externalIPs:    Enabled 
  - HostPort:       Enabled
BPF Maps:   dynamic sizing: on (ratio: 0.002500)
  Name                          Size
  Non-TCP connection tracking   65536
  TCP connection tracking       131072
  Endpoint policy               65535
  Events                        2
  IP cache                      512000
  IP masquerading agent         16384
  IPv4 fragmentation            8192
  IPv4 service                  65536
  IPv6 service                  65536
  IPv4 service backend          65536
  IPv6 service backend          65536
  IPv4 service reverse NAT      65536
  IPv6 service reverse NAT      65536
  Metrics                       1024
  NAT                           131072
  Neighbor table                131072
  Global policy                 16384
  Per endpoint policy           65536
  Session affinity              65536
  Signal                        2
  Sockmap                       65535
  Sock reverse NAT              65536
  Tunnel                        65536
Encryption:           Disabled
Cluster health:       6/6 reachable   (2021-12-30T21:15:14Z)
  Name                IP              Node        Endpoints
  node2 (localhost)   192.168.33.20   reachable   reachable
  node1               192.168.33.10   reachable   reachable
  node3               192.168.33.30   reachable   reachable
  node4               192.168.33.40   reachable   reachable
  node5               192.168.33.50   reachable   reachable
  node6               192.168.33.60   reachable   reachable

I’m running Rocky Linux 8.5. My Helm overrides:

values:
  hostPort:
    enabled: true
  hostServices:
    enabled: true
  containerRuntime:
    integration: crio
  hostFirewall:
    enabled: true
  hubble:
    relay:
      enabled: true
    ui:
      enabled: true
      ingress:
        enabled: true
        className: "{{ ingress_class }}"
        hosts: ["{{ ingress_hosts['hubble'] }}"]

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 2
  • Comments: 19 (13 by maintainers)

Most upvoted comments

We are using purelb with instances of bird doing OSPF with our Cisco SDN to exchange dynamic route informations. Using cilium 1.13.4 I see the egress multicast packages being dropped due to CT: Unknown L4 protocol once the host-firewall is enabled. Would be great to whitelist protocol names/numbers to be able to enable the host-firewall.

Same problem on Clilum 1.13.2

This still appears to be an issue and breaks VRRP usage on Cilium 1.12 host firewall enabled machines. It would be great to have a fix for this as I would’ve wanted to use VRRP + HAProxy for baremetal kube-apiserver load balancing.

1.10.4:

❯ k exec cilium-2xt2q -- bpftool net show
Defaulted container "cilium-agent" out of: cilium-agent, clean-cilium-state (init)
xdp:

tc:
eth1(3) clsact/ingress bpf_netdev_eth1.o:[from-netdev] id 701
eth1(3) clsact/egress bpf_netdev_eth1.o:[to-netdev] id 707
cilium_net(4) clsact/ingress bpf_host_cilium_net.o:[to-host] id 695
cilium_host(5) clsact/ingress bpf_host.o:[to-host] id 683
cilium_host(5) clsact/egress bpf_host.o:[from-host] id 689
cilium_vxlan(6) clsact/ingress bpf_overlay.o:[from-overlay] id 645
cilium_vxlan(6) clsact/egress bpf_overlay.o:[to-overlay] id 650
lxc_health(8) clsact/ingress bpf_lxc.o:[from-container] id 667
lxc024fb7ec7e53(10) clsact/ingress bpf_lxc.o:[from-container] id 665
lxcbbcd04f5cea6(12) clsact/ingress bpf_lxc.o:[from-container] id 669

flow_dissector:



1.11:

Defaulted container "cilium-agent" out of: cilium-agent, clean-cilium-state (init)
xdp:

tc:
eth1(3) clsact/ingress bpf_netdev_eth1.o:[from-netdev] id 769
eth1(3) clsact/egress bpf_netdev_eth1.o:[to-netdev] id 779
cilium_net(4) clsact/ingress bpf_host_cilium_net.o:[to-host] id 759
cilium_host(5) clsact/ingress bpf_host.o:[to-host] id 739
cilium_host(5) clsact/egress bpf_host.o:[from-host] id 749
cilium_vxlan(6) clsact/ingress bpf_overlay.o:[from-overlay] id 670
cilium_vxlan(6) clsact/egress bpf_overlay.o:[to-overlay] id 678
lxc_health(8) clsact/ingress bpf_lxc.o:[from-container] id 727
lxca71197083523(10) clsact/ingress bpf_lxc.o:[from-container] id 788

flow_dissector:


Update: With hostfirewall disabled, it works on 1.11.

Thanks for following up.

What’s the output of this command when run from one of the cilium pods - sudo bpftool prog show | grep from_netdev?

If you need to enable host firewall in your cluster without affecting keepalived traffic, you could potentially have keepalived use a separate node interface. Exclude that interface from the --devices config passed to cilium so that BPF programs that drop non-supported L4 protocols won’t be involved. Cc @pchaigno to confirm this.