cilium-cli: cilium connectivity test failures
Is there an existing issue for this?
- I have searched the existing issues
What happened?
When running cilium connectivity test, several tests are failing. no-policies, client-egress-l7, and to-fqdns. They all fail due to curl exiting with a code 28.
The thing is, when I run a curlimages/curl container and run the commands listed in the test output (removing the -w, and --output flags), it downloads the page just fine.
This is on a new Kubernetes 1.23 cluster. Cilium was the first thing I installed. Ubuntu 20.04 is the host OS. It is running FirewallD for a host level firewall.
Is this something I should try to fix, or can I ignore these tests?
Cilium Version
cilium-cli: v0.10.0 compiled with go1.17.4 on linux/amd64 cilium image (default): v1.11.0 cilium image (stable): v1.11.0 cilium image (running): v1.11.0
Kernel Version
Linux controlplane 5.4.0-91-generic cilium/cilium#102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
Client Version: version.Info{Major:“1”, Minor:“23”, GitVersion:“v1.23.1”, GitCommit:“86ec240af8cbd1b60bcc4c03c20da9b98005b92e”, GitTreeState:“clean”, BuildDate:“2021-12-16T11:41:01Z”, GoVersion:“go1.17.5”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“23”, GitVersion:“v1.23.1”, GitCommit:“86ec240af8cbd1b60bcc4c03c20da9b98005b92e”, GitTreeState:“clean”, BuildDate:“2021-12-16T11:34:54Z”, GoVersion:“go1.17.5”, Compiler:“gc”, Platform:“linux/amd64”}
Sysdump
No response
Relevant log output
root@controlplane:~# cilium connectivity test
ℹ️ Monitor aggregation detected, will skip some flow validation steps
⌛ [clustername] Waiting for deployments [client client2 echo-same-node] to become ready...
⌛ [clustername] Waiting for deployments [echo-other-node] to become ready...
⌛ [clustername] Waiting for CiliumEndpoint for pod cilium-test/client-7568bc7f86-2mdgt to appear...
⌛ [clustername] Waiting for CiliumEndpoint for pod cilium-test/client2-686d5f784b-5llc9 to appear...
⌛ [clustername] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-59d779959c-2jggr to appear...
⌛ [clustername] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-5767b7b99d-rmcqc to appear...
⌛ [clustername] Waiting for Service cilium-test/echo-same-node to become ready...
⌛ [clustername] Waiting for Service cilium-test/echo-other-node to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.8:31231 (cilium-test/echo-other-node) to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.8:32187 (cilium-test/echo-same-node) to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.9:31231 (cilium-test/echo-other-node) to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.9:32187 (cilium-test/echo-same-node) to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.207:31231 (cilium-test/echo-other-node) to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.207:32187 (cilium-test/echo-same-node) to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.7:31231 (cilium-test/echo-other-node) to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.7:32187 (cilium-test/echo-same-node) to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.208:32187 (cilium-test/echo-same-node) to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.208:31231 (cilium-test/echo-other-node) to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.209:31231 (cilium-test/echo-other-node) to become ready...
⌛ [clustername] Waiting for NodePort nnn.nnn.nnn.209:32187 (cilium-test/echo-same-node) to become ready...
ℹ️ Skipping IPCache check
⌛ [clustername] Waiting for pod cilium-test/client-7568bc7f86-2mdgt to reach default/kubernetes service...
⌛ [clustername] Waiting for pod cilium-test/client2-686d5f784b-5llc9 to reach default/kubernetes service...
🔭 Enabling Hubble telescope...
⚠️ Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp [::1]:4245: connect: connection refused"
ℹ️ Expose Relay locally with:
cilium hubble enable
cilium hubble port-forward&
🏃 Running tests...
[=] Test [no-policies]
................................................
[=] Test [allow-all]
............................................
[=] Test [client-ingress]
..
[=] Test [echo-ingress]
....
[=] Test [client-egress]
....
[=] Test [to-entities-world]
......
[=] Test [to-cidr-1111]
....
[=] Test [echo-ingress-l7]
....
[=] Test [client-egress-l7]
........
ℹ️ 📜 Applying CiliumNetworkPolicy 'client-egress-only-dns' to namespace 'cilium-test'..
ℹ️ 📜 Applying CiliumNetworkPolicy 'client-egress-l7-http' to namespace 'cilium-test'..
[-] Scenario [client-egress-l7/pod-to-pod]
[.] Action [client-egress-l7/pod-to-pod/curl-0: cilium-test/client-7568bc7f86-2mdgt (nnn.nnn.3.67) -> cilium-test/echo-same-node-5767b7b99d-rmcqc (nnn.nnn.3.159:8080)]
[.] Action [client-egress-l7/pod-to-pod/curl-1: cilium-test/client-7568bc7f86-2mdgt (nnn.nnn.3.67) -> cilium-test/echo-other-node-59d779959c-2jggr (nnn.nnn.4.193:8080)]
[.] Action [client-egress-l7/pod-to-pod/curl-2: cilium-test/client2-686d5f784b-5llc9 (nnn.nnn.3.88) -> cilium-test/echo-other-node-59d779959c-2jggr (nnn.nnn.4.193:8080)]
[.] Action [client-egress-l7/pod-to-pod/curl-3: cilium-test/client2-686d5f784b-5llc9 (nnn.nnn.3.88) -> cilium-test/echo-same-node-5767b7b99d-rmcqc (nnn.nnn.3.159:8080)]
[-] Scenario [client-egress-l7/pod-to-world]
[.] Action [client-egress-l7/pod-to-world/http-to-one-one-one-one-0: cilium-test/client-7568bc7f86-2mdgt (nnn.nnn.3.67) -> one-one-one-one-http (one.one.one.one:80)]
[.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-0: cilium-test/client-7568bc7f86-2mdgt (nnn.nnn.3.67) -> one-one-one-one-https (one.one.one.one:443)]
[.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-index-0: cilium-test/client-7568bc7f86-2mdgt (nnn.nnn.3.67) -> one-one-one-one-https-index (one.one.one.one:443)]
[.] Action [client-egress-l7/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-686d5f784b-5llc9 (nnn.nnn.3.88) -> one-one-one-one-http (one.one.one.one:80)]
❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
ℹ️ curl output:
curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
📄 No flows recorded during action http-to-one-one-one-one-1
📄 No flows recorded during action http-to-one-one-one-one-1
[.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-1: cilium-test/client2-686d5f784b-5llc9 (nnn.nnn.3.88) -> one-one-one-one-https (one.one.one.one:443)]
[.] Action [client-egress-l7/pod-to-world/https-to-one-one-one-one-index-1: cilium-test/client2-686d5f784b-5llc9 (nnn.nnn.3.88) -> one-one-one-one-https-index (one.one.one.one:443)]
ℹ️ 📜 Deleting CiliumNetworkPolicy 'client-egress-only-dns' from namespace 'cilium-test'..
ℹ️ 📜 Deleting CiliumNetworkPolicy 'client-egress-l7-http' from namespace 'cilium-test'..
[=] Test [dns-only]
..........
[=] Test [to-fqdns]
.
ℹ️ 📜 Applying CiliumNetworkPolicy 'client-egress-to-fqdns-one-one-one-one' to namespace 'cilium-test'..
[-] Scenario [to-fqdns/pod-to-world]
[.] Action [to-fqdns/pod-to-world/http-to-one-one-one-one-0: cilium-test/client2-686d5f784b-5llc9 (nnn.nnn.3.88) -> one-one-one-one-http (one.one.one.one:80)]
❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
ℹ️ curl output:
curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
📄 No flows recorded during action http-to-one-one-one-one-0
📄 No flows recorded during action http-to-one-one-one-one-0
[.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-0: cilium-test/client2-686d5f784b-5llc9 (nnn.nnn.3.88) -> one-one-one-one-https (one.one.one.one:443)]
[.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-index-0: cilium-test/client2-686d5f784b-5llc9 (nnn.nnn.3.88) -> one-one-one-one-https-index (one.one.one.one:443)]
[.] Action [to-fqdns/pod-to-world/http-to-one-one-one-one-1: cilium-test/client-7568bc7f86-2mdgt (nnn.nnn.3.67) -> one-one-one-one-http (one.one.one.one:80)]
❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --connect-timeout 5 --output /dev/null http://one.one.one.one:80" failed: command terminated with exit code 28
ℹ️ curl output:
curl: (28) Resolving timed out after 5000 milliseconds
:0 -> :0 = 000
📄 No flows recorded during action http-to-one-one-one-one-1
📄 No flows recorded during action http-to-one-one-one-one-1
[.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-1: cilium-test/client-7568bc7f86-2mdgt (nnn.nnn.3.67) -> one-one-one-one-https (one.one.one.one:443)]
[.] Action [to-fqdns/pod-to-world/https-to-one-one-one-one-index-1: cilium-test/client-7568bc7f86-2mdgt (nnn.nnn.3.67) -> one-one-one-one-https-index (one.one.one.one:443)]
ℹ️ 📜 Deleting CiliumNetworkPolicy 'client-egress-to-fqdns-one-one-one-one' from namespace 'cilium-test'..
📋 Test Report
❌ 2/11 tests failed (3/142 actions), 0 tests skipped, 0 scenarios skipped:
Test [client-egress-l7]:
❌ client-egress-l7/pod-to-world/http-to-one-one-one-one-1: cilium-test/client2-686d5f784b-5llc9 (nnn.nnn.3.88) -> one-one-one-one-http (one.one.one.one:80)
Test [to-fqdns]:
❌ to-fqdns/pod-to-world/http-to-one-one-one-one-0: cilium-test/client2-686d5f784b-5llc9 (nnn.nnn.3.88) -> one-one-one-one-http (one.one.one.one:80)
❌ to-fqdns/pod-to-world/http-to-one-one-one-one-1: cilium-test/client-7568bc7f86-2mdgt (nnn.nnn.3.67) -> one-one-one-one-http (one.one.one.one:80)
Connectivity test failed: 2 tests failed
Anything else?
This issue https://github.com/cilium/cilium/issues/18273 has a similar failing test, but only mentions one of the tests failing. Not both, like in my result.
I reran the connectivity test today, about two weeks after my initial run (and slack post). Same result. That said, the few things I did start running on my cluster all seem to be working fine.
I do have the sysdump file, but I’m reluctant to upload it until I confirm it doesn’t contain anything I don’t want it to contain. If it’s really needed, let me know.
Thanks in advance!
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 26 (11 by maintainers)
Hmm… Rich rules might be one way to get around things, but I haven’t done anything with them before…
I did find this comment in firewalld’s issue queue: https://github.com/firewalld/firewalld/issues/767#issuecomment-790687269 That led to me adding cilium_host, cilium_net, and cilium_vxlan to the trusted zone. After restarting everything, the connectivity tests all passed just fine. :\
That makes me wonder if I need to be concerned about the plethora of virtual nics that Kube creates. All the lxc* nics listed in ifconfig’s output. There’s 1 per pod, right? Do I need to somehow make sure they’re also added to the trusted zone? I mean, I thought that Kube would add/modify any needed rules for virtual nics it creates when it starts up, presumably after firewalld. But since manually adding the cilium_* nics to the trusted zone made a difference, I’m not sure anymore.
Guess I’d better dive into how Kube networking works a bit deeper.
Are you able to narrow down the list of rules by inspecting the drop counters via
iptables-save -c ...?Hm, that sounds like a cilium-cli issue to me rather than a Cilium issue, given that you can run the commands manually successfully.
@tklauser Should we transfer this issue to the cilium-cli repo?