cilium: CI: test-conn-disrupt-client failed due to interrupted traffic during upgrade/downgrade

CI failure

Failed job: https://github.com/cilium/cilium/actions/runs/6154793242/job/16700697606

 [=] Test [no-interrupted-connections]

  [-] Scenario [no-interrupted-connections/no-interrupted-connections]
  🟥 Pod test-conn-disrupt-client-cc557fcf9-8sjnb flow was interrupted (restart count does not match 2 != 3)

cilium-sysdump-2-final.zip

This issue is different from Drop due to missed tail call when doing downgrade from main to v1.13.4 because there is no Missed tail call in cilium-bugtool/cmd/cilium-metrics-list.md.

This issue is not an IPsec related issue because:

  1. This failed check didn’t enable IPsec, although the job name is IPsec upgrade. The job given above was triggered from https://github.com/cilium/cilium/pull/28086 which disabled IPsec.
  2. XFRM counters didn’t increase.

About this issue

  • Original URL
  • State: open
  • Created 10 months ago
  • Comments: 24 (24 by maintainers)

Commits related to this issue

Most upvoted comments

For the sake of posterity the latest bpftrace script:

> cat trace.bt
struct key { u32 daddr; u32 saddr; u16 dport; u16 sport; u8 nexthdr; u8 flags }
struct val { u64 a; u64 backend_id; u64 b; u64 c; u32 d; u16 e; u16 rev_nat_index  }

kretfunc:vmlinux:bpf_map_lookup_elem {
    if (((struct bpf_map *)args->map)->id == $1 && retval != 0) {
        printf("%s L %s:%d->%s:%d (%d) : %x (%d)\n",
            strftime("%H:%M:%S:%f", nsecs),
            ntop(((struct key *)((struct bpf_map*)args->key))->saddr),
            bswap(((struct key *)((struct bpf_map*)args->key))->dport),
            ntop(((struct key *)((struct bpf_map*)args->key))->daddr),
            bswap(((struct key *)((struct bpf_map*)args->key))->sport),
            ((struct key *)((struct bpf_map*)args->key))->flags,
            retval,
            bswap(((struct val *)retval)->rev_nat_index)
        );
    }
}

kretfunc:vmlinux:bpf_map_delete_elem {
    if (((struct bpf_map *)args->map)->id == $1) {
        printf("%s D %s:%d->%s:%d : %x\n",
            strftime("%H:%M:%S:%f", nsecs),
            ntop(((struct key *)((struct bpf_map*)args->key))->saddr),
            bswap(((struct key *)((struct bpf_map*)args->key))->dport),
            ntop(((struct key *)((struct bpf_map*)args->key))->daddr),
            bswap(((struct key *)((struct bpf_map*)args->key))->sport),
            retval
        );
    }
}

kretfunc:vmlinux:bpf_map_update_elem {
    if (((struct bpf_map *)args->map)->id == $1) {
        printf("%s U %s:%d->%s:%d (%d) => %x (%d) : %x\n",
            strftime("%H:%M:%S:%f", nsecs),
            ntop(((struct key *)((struct bpf_map*)args->key))->saddr),
            bswap(((struct key *)((struct bpf_map*)args->key))->dport),
            ntop(((struct key *)((struct bpf_map*)args->key))->daddr),
            bswap(((struct key *)((struct bpf_map*)args->key))->sport),
            ((struct key *)((struct bpf_map*)args->key))->flags,
            args->value,
            bswap(((struct val *)((struct bpf_map*)args->value))->rev_nat_index),
            retval
        );
    }
}

@giorio94 Mentioned that the drop happens almost at the same time when the related client endpoint gets regenerated. The drop was observed only when both IPv{4,6} and the bpf_lxc per-packet LB were enabled. Anyway, I extend the Marco’s reproducer to get more info:

cd cilium &&  ./contrib/scripts/kind.sh "" 2 "" "" "iptables" "dual"
cd cilium-cli && git co origin/pr/brb/conn-short && make
./cilium-cli/cilium install --set=debug.enabled=true --set=bpf.masquerade=false --set=bpf.monitorAggregation=none --set=hubble.enabled=false --set=tunnel=vxlan --set=ipv4.enabled=true --set=ipv6.enabled=true

cat > ./cilium-cli/run.sh <<EOF
#!/bin/bash -e

./cilium connectivity test \
  --hubble=false \
  --conn-disrupt-dispatch-interval 0ms \
  --include-conn-disrupt-test --conn-disrupt-test-setup

kubectl -n cilium-test wait pods -l app=test-conn-disrupt-client --for condition=Ready --timeout=90s

CLIENT_POD=$(kubectl -n cilium-test get pods -l app=test-conn-disrupt-client -o wide  | grep kind-worker2 | head -n1 | awk '{print $1}')
CLIENT_IP=$(kubectl -n cilium-test logs $CLIENT_POD | grep -o 'from .*' | awk '{print $2}' | cut -d: -f1)
CLIENT_PORT=$(kubectl -n cilium-test logs $CLIENT_POD | grep -o 'from .*' | awk '{print $2}' | cut -d: -f2)

echo "$CLIENT_POD $CLIENT_IP $CLIENT_PORT"

sudo rm -f /tmp/pwru.ready

sudo pwru --output-tuple \
   --output-file=/tmp/pwru.log \
	--output-meta \
   --ready-file=/tmp/pwru.ready \
	"host $CLIENT_IP or port $CLIENT_PORT" \
    2>/tmp/pwru.status &
PWRU_PID=$!

while [ ! -f /tmp/pwru.ready ]; do sleep 1; done

kubectl rollout restart -n kube-system ds/cilium

# Wait until all agents successfully restarted before scaling the replicas again
kubectl rollout status -n kube-system ds/cilium --timeout=1m

./cilium status --wait

sleep 10

./cilium connectivity test --short

./cilium connectivity test \
  --hubble=false \
  --flow-validation=disabled \
  --test='no-interrupted-connections' \
  --test='no-missed-tail-calls' \
  --include-conn-disrupt-test

sudo kill $PWRU_PID
wait $PWRU_PID

EOF

chmod +x ./cilium-cli/run.sh
cd cilium-cli/
while ./run.sh; do :; done

The drop with pwru:

0xffff88e04ad5c2e8     19     [tcd-server] __netif_receive_skb_one_core netns=4026533492 mark=0x0 iface=4(cilium_vxlan) proto=8 mtu=1500 len=308 10.244.0.128:8000->10.244.1.186:33086(tcp)
0xffff88e04ad5c2e8     19     [tcd-server]             tcf_classify netns=4026533492 mark=0x0 iface=4(cilium_vxlan) proto=8 mtu=1500 len=308 10.244.0.128:8000->10.244.1.186:33086(tcp)
0xffff88e04ad5c2e8     19     [tcd-server]      skb_ensure_writable netns=4026533492 mark=0x0 iface=4(cilium_vxlan) proto=8 mtu=1500 len=322 10.244.0.128:8000->10.244.1.186:33086(tcp)
0xffff88e04ad5c2e8     19     [tcd-server]      skb_ensure_writable netns=4026533492 mark=0x0 iface=4(cilium_vxlan) proto=8 mtu=1500 len=322 10.244.0.128:8000->10.244.1.186:33086(tcp)
0xffff88e04ad5c2e8     19     [tcd-server]      skb_ensure_writable netns=4026533492 mark=0x0 iface=4(cilium_vxlan) proto=8 mtu=1500 len=322 10.244.0.128:8000->10.244.1.186:33086(tcp)
0xffff88e04ad5c2e8     19     [tcd-server]      skb_ensure_writable netns=4026533492 mark=0x0 iface=4(cilium_vxlan) proto=8 mtu=1500 len=322 10.244.0.128:8000->10.244.1.186:33086(tcp)
0xffff88e04ad5c2e8     19     [tcd-server]      skb_ensure_writable netns=4026533492 mark=0x0 iface=4(cilium_vxlan) proto=8 mtu=1500 len=322 10.244.0.128:8000->10.244.1.186:33086(tcp)
0xffff88e04ad5c2e8     19     [tcd-server]          skb_do_redirect netns=4026533492 mark=0x0 iface=4(cilium_vxlan) proto=8 mtu=1500 len=322 10.244.0.128:8000->10.244.1.186:33086(tcp)
0xffff88e04ad5c2e8     19     [tcd-server]           __bpf_redirect netns=4026533492 mark=0x0 iface=4(cilium_vxlan) proto=8 mtu=1500 len=322 10.244.0.128:8000->10.244.1.186:33086(tcp)
...
0xffff88e04ad5c2e8     19     [tcd-server] kfree_skb_reason(SKB_DROP_REASON_NO_SOCKET) netns=4026534202 mark=0x0 iface=278(eth0) proto=8 mtu=1500 len=288 10.244.0.128:8000->10.244.1.186:33086(tcp)

What happened is that the BPF prog attached to cilium_vxlan didn’t do the rev-DNAT. This is most likely due to a missing CT entry. I’m continuing the investigation.

Here is a recent one with a missed tail call: https://github.com/cilium/cilium/actions/runs/7089953274/job/19295849453. It happens on the IPsec workflow so it’s not the host firewall one (incompatible with IPsec).

Small update. Added bpftrace to the rescue:

cat > trace.bt <<EOF
struct key { u32 daddr; u32 saddr; u16 dport; u16 sport }

kretfunc:vmlinux:bpf_map_lookup_elem {
    if (((struct bpf_map *)args->map)->id == $1) {
        printf("%s %s:%d->%s:%d : %x\n",
            strftime("%H:%M:%S:%f", nsecs),
            ntop(((struct key *)((struct bpf_map*)args->key))->saddr),
            bswap(((struct key *)((struct bpf_map*)args->key))->dport),
            ntop(((struct key *)((struct bpf_map*)args->key))->daddr),
            bswap(((struct key *)((struct bpf_map*)args->key))->sport),
            retval
        );
    }
}
EOF

# to get CT map ID:
docker exec -ti kind-worker2 bpftool map show pinned /sys/fs/bpf/tc/globals/cilium_ct4_global

sudo bpftrace trace.bt $MAP_ID

The relevant output:

13:17:13:584423 10.244.1.102:8000->10.244.1.154:60358 : e8bb5980
...
13:17:13:789869 10.244.1.102:8000->10.244.1.154:60358 : e8c22238 <-- DROP happened ~at this point of time

From ^^ I see that the CT map lookup for the reply got a different map value (which probably didn’t have any rev_nat_id set) than before. To be continued.

https://github.com/cilium/cilium/issues/27827 was a duplicate of this issue. Closed to avoid duplicates, but it has a couple more sysdumps, in case that’s helpful.

The ci-e2e-upgrade PR is hitting this too (1 out of ~3 times). For example, https://github.com/cilium/cilium/actions/runs/6147703736/job/16679772837.