k3d: [BUG] Pod network failing to start when installing calico operator with k3d v5.2.1
What did you do
-
How was the cluster created?
k3d cluster create "k3s-default" --k3s-arg '--flannel-backend=none@server:*'
-
What did you do afterwards? I tried to install the calico or tigera operator onto the cluster with
containerIPForwardingenabled.
kubectl apply -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
curl -L https://docs.projectcalico.org/manifests/custom-resources.yaml > k3d-custom-res.yaml
yq e '.spec.calicoNetwork.containerIPForwarding="Enabled"' -i k3d-custom-res.yaml
kubectl apply -f k3d-custom-res.yaml
-
k3d commands?
-
docker commands?
docker psto check running containersdocker exec -ti <node> /bin/shto ssh into a container -
OS operations (e.g. shutdown/reboot)? Ran linux system cmds (ls, cat, etc…) inside pods and containers
What did you expect to happen
The pod network should be up and running successfully in all namespaces. All pods are in the running state.
Screenshots or terminal output
The calico-nodes are able to run without issue but other containers are stuck in the ContainerCreating state (coredns, metrics, calico-kube-controller)
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
tigera-operator tigera-operator-7dc6bc5777-h5sp7 1/1 Running 0 106s
calico-system calico-typha-9b59bcc69-w2ml8 1/1 Running 0 83s
calico-system calico-kube-controllers-78cc777977-8xf5v 0/1 ContainerCreating 0 83s
kube-system coredns-7448499f4d-8pwtf 0/1 ContainerCreating 0 106s
kube-system metrics-server-86cbb8457f-h26x4 0/1 ContainerCreating 0 106s
kube-system helm-install-traefik-h6qhh 0/1 ContainerCreating 0 106s
kube-system helm-install-traefik-crd-8xsxm 0/1 ContainerCreating 0 106s
kube-system local-path-provisioner-5ff76fc89d-ql55s 0/1 ContainerCreating 0 106s
calico-system calico-node-6xbq7 1/1 Running 0 83s
When describing the stuck pods, I see this in its events:
$ kubectl describe pod/calico-kube-controllers-78cc777977-8xf5v -n calico-system
Warning FailedCreatePodSandBox 3s kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b474a530f7b8727fc101404ebb551135059f5aa359beb50bae176fd05cf2c20d": netplugin failed with no error message: fork/exec /opt/cni/bin/calico: no such file or directory
Based on the error above, I went to check /opt/cni/bin/calico to see if the calico binary existed in the container, which it does:
glen@glen-tigera: $ docker exec -ti k3d-k3s-default-server-0 /bin/sh
/ # ls
bin dev etc k3d lib opt output proc run sbin sys tmp usr var
/ # cd /opt/cni/bin/
/opt/cni/bin # ls -a
. .. bandwidth **calico** calico-ipam flannel host-local install loopback portmap tags.txt tuning
CNI Config Yaml:
kubectl get cm cni-config -n calico-system -o yaml
apiVersion: v1
data:
config: |-
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"datastore_type": "kubernetes",
"mtu": 0,
"nodename_file_optional": false,
"log_level": "Info",
"log_file_path": "/var/log/calico/cni/cni.log",
"ipam": { "type": "calico-ipam", "assign_ipv4" : "true", "assign_ipv6" : "false"},
"container_settings": {
"allow_ip_forwarding": true
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"k8s_api_root":"https://10.43.0.1:443",
"kubeconfig": "__KUBECONFIG_FILEPATH__"
}
},
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
},
{"type": "portmap", "snat": true, "capabilities": {"portMappings": true}}
]
}
kind: ConfigMap
metadata:
creationTimestamp: "2021-12-17T18:02:24Z"
name: cni-config
namespace: calico-system
ownerReferences:
- apiVersion: operator.tigera.io/v1
blockOwnerDeletion: true
controller: true
kind: Installation
name: default
uid: c53d18b5-efc6-4155-879b-6097a8c2c14c
resourceVersion: "675"
uid: 003c9cdc-0ef5-4d63-8d30-d6e1ed79d4c0
Which OS & Architecture
OS: GNU/Linux Kernel Version: 20.04.2-Ubuntu SMP Kernel Release: 5.11.0-40-generic Processor/HW Platform/Machine Architecture: x86_64
Which version of k3d
k3d version v5.2.1 k3s version v1.21.7-k3s1 (default)
Which version of docker
docker version:
Client: Docker Engine - Community
Version: 20.10.11
API version: 1.41
Go version: go1.16.9
Git commit: dea9396
Built: Thu Nov 18 00:37:06 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.11
API version: 1.41 (minimum version 1.12)
Go version: go1.16.9
Git commit: 847da18
Built: Thu Nov 18 00:35:15 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.12
GitCommit: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
runc:
Version: 1.0.2
GitCommit: v1.0.2-0-g52b36a2
docker-init:
Version: 0.19.0
GitCommit: `de40ad0`
docker info:
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)
scan: Docker Scan (Docker Inc., v0.9.0)
Server:
Containers: 20
Running: 0
Paused: 0
Stopped: 20
Images: 22
Server Version: 20.10.11
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
runc version: v1.0.2-0-g52b36a2
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.11.0-40-generic
Operating System: Ubuntu 20.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 31.09GiB
Name: glen-tigera
ID: 6EZ7:QGFF:Z2KK:Q7K3:YKGI:6FIS:X2UP:JX5W:UGXA:FIZW:CYV6:RDDU
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 20 (7 by maintainers)
Sorry, just understood how you got there. This is the script that’s being executed: https://github.com/projectcalico/calico/blob/master/pod2daemon/flexvol/docker/flexvol.sh
I’m checking the variants of installation now (the one from k3d docs and yours) with regards to the uds:
Via Operator:
Without Operator:
Googling for that error message, this issue in rke2 popped up: https://github.com/rancher/rke2/issues/234
Ah at least you could track it down to a specific version already 👍 Fingers crossed you’ll figure out the root cause.
Upon further testing, our v3.21 (latest release) operator install seems to no longer be compatible with k3d clusters. I tested the operator starting from v3.15 and every version was working till v3.21. Followed up with the larger team to discuss further.
k3d-calico-operator-install-findings.txt
Ah - the way that the tigera-operator works, there’s a version of operator that maps to a version of calico (since the manifests are baked into it). For v3.15, you’ll want to apply: https://docs.projectcalico.org/archive/v3.15/manifests/tigera-operator.yaml
(the intent is to make the upgrade experience better - in an operator managed cluster, you upgrade calico by simply applying the uplevel tigera-operator.yaml and it takes care of everything). In the old manifest install, you’d have customised your install in various ways directly in the yaml, so to upgrade you have to get the new yaml, then make the same edits as you did before, then apply and hope you did it right. Whereas in an operator setup, you have configured all your customisations in the Installation resource. The new operator reads that and does “the right thing” to apply those customisations.
@iwilltry42 When I ran the command your posted earlier, there was no such file or directory on my setup:
There is no
nodeagent~udsdirectory when I try to look inside the container:From https://projectcalico.docs.tigera.io/reference/installation/api, I think this all means we need to set
spec.flexVolumePath: "/usr/local/bin/"in the Installation resource in custom-resourcesAwesome, thank you, that gives us a thread to pull on.