k3d: [BUG] failing on linux kernel >= 5.12.2 (and >= 5.11.19, ...)
What did you do
-
How was the cluster created?
k3d cluster create "vald-cluster" -p "8081:80@loadbalancer" --agents 5
-
What did you do afterwards?
- k3d commands?
- docker commands?
- OS operations (e.g. shutdown/reboot)?
What did you expect to happen
I expect that k3d works with the latest kernels, just as it worked with Linux Kernel 5.11.
When multiple Agents were specified, k3d did not proceed to start the Agents when the cluster was created, and the Docker logs showed that the K3D Server was restarting at a high frequency.
In the container log, it looks like kube-proxy is failing to start due to failure to set nf_conntrack_max.
I looked into this a bit and found a similar problem in kind and minikube, and it seems to be fixed in the following Issue and PR.
for kind https://github.com/kubernetes-sigs/kind/issues/2240 https://github.com/kubernetes-sigs/kind/pull/2241
for minikube https://github.com/kubernetes/minikube/pull/11419
Screenshots or terminal output
k3d command cli log
INFO[0000] Prep: Network
INFO[0000] Created network 'k3d-vald-cluster' (dd17969b82f64e60516154a7a8492e91daf0e1eb9f4ed8a70d5d6d57601b3692)
INFO[0000] Created volume 'k3d-vald-cluster-images'
INFO[0001] Creating node 'k3d-vald-cluster-server-0'
INFO[0001] Creating node 'k3d-vald-cluster-agent-0'
INFO[0001] Creating node 'k3d-vald-cluster-agent-1'
INFO[0001] Creating node 'k3d-vald-cluster-agent-2'
INFO[0001] Creating node 'k3d-vald-cluster-agent-3'
INFO[0001] Creating node 'k3d-vald-cluster-agent-4'
INFO[0001] Creating LoadBalancer 'k3d-vald-cluster-serverlb'
INFO[0001] Starting cluster 'vald-cluster'
INFO[0001] Starting servers...
INFO[0001] Starting Node 'k3d-vald-cluster-server-0'
INFO[0005] Starting agents...
INFO[0005] Starting Node 'k3d-vald-cluster-agent-0' <----- creating process stops here!!!
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4ad2f4712ae1 rancher/k3s:v1.20.6-k3s1 "/bin/k3s agent" 6 minutes ago Up 3 minutes k3d-vald-cluster-agent-0
f9e815595dcf rancher/k3s:v1.20.6-k3s1 "/bin/k3s server --t…" 6 minutes ago Restarting (1) 42 seconds ago k3d-vald-cluster-server-0
docker logs --defail -t f9e815595dcf
2021-05-16T05:05:29.445125257Z time="2021-05-16T05:05:29.444942848Z" level=info msg="Starting k3s v1.20.6+k3s1 (8d043282)"
2021-05-16T05:05:29.445182399Z time="2021-05-16T05:05:29.445150819Z" level=info msg="Cluster bootstrap already complete"
2021-05-16T05:05:29.455391273Z time="2021-05-16T05:05:29.455257589Z" level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
2021-05-16T05:05:29.455407602Z time="2021-05-16T05:05:29.455285849Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
2021-05-16T05:05:29.455409948Z time="2021-05-16T05:05:29.455311476Z" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1621141073: notBefore=2021-05-16 04:57:53 +0000 UTC notAfter=2022-05-16 05:05:29 +0000 UTC"
2021-05-16T05:05:29.455444959Z time="2021-05-16T05:05:29.455362850Z" level=info msg="Database tables and indexes are up to date"
2021-05-16T05:05:29.456363097Z time="2021-05-16T05:05:29.456238613Z" level=info msg="Kine listening on unix://kine.sock"
2021-05-16T05:05:29.456432105Z time="2021-05-16T05:05:29.456335058Z" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=https://kubernetes.default.svc.cluster.local,k3s --authorization-mode=Node,RBAC --bind-address=127.0.0.1 --cert-dir=/var/lib/rancher/k3s/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --enable-admission-plugins=NodeRestriction --etcd-servers=unix://kine.sock --feature-gates=ServiceAccountIssuerDiscovery=false --insecure-port=0 --kubelet-certificate-authority=/var/lib/rancher/k3s/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --profiling=false --proxy-client-cert-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/k3s/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-cluster-ip-range=10.43.0.0/16 --service-node-port-range=30000-32767 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt --tls-private-key-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key"
2021-05-16T05:05:29.456995072Z Flag --insecure-port has been deprecated, This flag has no effect now and will be removed in v1.24.
2021-05-16T05:05:29.457269375Z I0516 05:05:29.457197 7 server.go:659] external host was not specified, using 192.168.9.3
2021-05-16T05:05:29.457451380Z I0516 05:05:29.457389 7 server.go:196] Version: v1.20.6+k3s1
2021-05-16T05:05:29.460661489Z I0516 05:05:29.460623 7 shared_informer.go:240] Waiting for caches to sync for node_authorizer
2021-05-16T05:05:29.461489325Z I0516 05:05:29.461466 7 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
2021-05-16T05:05:29.461498436Z I0516 05:05:29.461474 7 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
2021-05-16T05:05:29.462113819Z I0516 05:05:29.462055 7 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
2021-05-16T05:05:29.462123361Z I0516 05:05:29.462061 7 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
2021-05-16T05:05:29.474074978Z I0516 05:05:29.473963 7 instance.go:289] Using reconciler: lease
2021-05-16T05:05:29.492167498Z I0516 05:05:29.492094 7 rest.go:131] the default service ipfamily for this cluster is: IPv4
2021-05-16T05:05:29.650038941Z W0516 05:05:29.649900 7 genericapiserver.go:425] Skipping API batch/v2alpha1 because it has no resources.
2021-05-16T05:05:29.655280702Z W0516 05:05:29.655185 7 genericapiserver.go:425] Skipping API discovery.k8s.io/v1alpha1 because it has no resources.
2021-05-16T05:05:29.661628727Z W0516 05:05:29.661533 7 genericapiserver.go:425] Skipping API node.k8s.io/v1alpha1 because it has no resources.
2021-05-16T05:05:29.666259481Z W0516 05:05:29.666153 7 genericapiserver.go:425] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
2021-05-16T05:05:29.668215081Z W0516 05:05:29.668089 7 genericapiserver.go:425] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.
2021-05-16T05:05:29.671398493Z W0516 05:05:29.671305 7 genericapiserver.go:425] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
2021-05-16T05:05:29.672868265Z W0516 05:05:29.672771 7 genericapiserver.go:425] Skipping API flowcontrol.apiserver.k8s.io/v1alpha1 because it has no resources.
2021-05-16T05:05:29.676425119Z W0516 05:05:29.676254 7 genericapiserver.go:425] Skipping API apps/v1beta2 because it has no resources.
2021-05-16T05:05:29.676434571Z W0516 05:05:29.676262 7 genericapiserver.go:425] Skipping API apps/v1beta1 because it has no resources.
2021-05-16T05:05:29.681968471Z I0516 05:05:29.681798 7 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
2021-05-16T05:05:29.681978716Z I0516 05:05:29.681805 7 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
2021-05-16T05:05:29.689958434Z time="2021-05-16T05:05:29.689772088Z" level=info msg="Running kube-scheduler --address=127.0.0.1 --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --leader-elect=false --port=10251 --profiling=false --secure-port=0"
2021-05-16T05:05:29.690001012Z time="2021-05-16T05:05:29.689866940Z" level=info msg="Waiting for API server to become available"
2021-05-16T05:05:29.690364949Z time="2021-05-16T05:05:29.690274194Z" level=info msg="Running kube-controller-manager --address=127.0.0.1 --allocate-node-cidrs=true --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-cert-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --cluster-signing-key-file=/var/lib/rancher/k3s/server/tls/client-ca.key --configure-cloud-routes=false --controllers=*,-service,-route,-cloud-node-lifecycle --kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --leader-elect=false --port=10252 --profiling=false --root-ca-file=/var/lib/rancher/k3s/server/tls/server-ca.crt --secure-port=0 --service-account-private-key-file=/var/lib/rancher/k3s/server/tls/service.key --use-service-account-credentials=true"
2021-05-16T05:05:29.691579980Z time="2021-05-16T05:05:29.691499972Z" level=info msg="Node token is available at /var/lib/rancher/k3s/server/token"
2021-05-16T05:05:29.691588379Z time="2021-05-16T05:05:29.691532877Z" level=info msg="To join node to cluster: k3s agent -s https://192.168.9.3:6443 -t ${NODE_TOKEN}"
2021-05-16T05:05:29.692480504Z time="2021-05-16T05:05:29.692335848Z" level=info msg="Wrote kubeconfig /output/kubeconfig.yaml"
2021-05-16T05:05:29.692511505Z time="2021-05-16T05:05:29.692424415Z" level=info msg="Run: k3s kubectl"
2021-05-16T05:05:29.692662540Z time="2021-05-16T05:05:29.692560855Z" level=info msg="Module overlay was already loaded"
2021-05-16T05:05:29.692668059Z time="2021-05-16T05:05:29.692584060Z" level=info msg="Module nf_conntrack was already loaded"
2021-05-16T05:05:29.692691286Z time="2021-05-16T05:05:29.692590031Z" level=info msg="Module br_netfilter was already loaded"
2021-05-16T05:05:29.693054521Z time="2021-05-16T05:05:29.692989749Z" level=warning msg="Failed to start iptable_nat module"
2021-05-16T05:05:29.703958256Z time="2021-05-16T05:05:29.703817885Z" level=info msg="Cluster-Http-Server 2021/05/16 05:05:29 http: TLS handshake error from 127.0.0.1:49440: remote error: tls: bad certificate"
2021-05-16T05:05:29.706308528Z time="2021-05-16T05:05:29.706185602Z" level=info msg="Cluster-Http-Server 2021/05/16 05:05:29 http: TLS handshake error from 127.0.0.1:49446: remote error: tls: bad certificate"
2021-05-16T05:05:29.717045241Z time="2021-05-16T05:05:29.716906780Z" level=info msg="certificate CN=k3d-vald-cluster-server-0 signed by CN=k3s-server-ca@1621141073: notBefore=2021-05-16 04:57:53 +0000 UTC notAfter=2022-05-16 05:05:29 +0000 UTC"
2021-05-16T05:05:29.718456054Z time="2021-05-16T05:05:29.718327116Z" level=info msg="certificate CN=system:node:k3d-vald-cluster-server-0,O=system:nodes signed by CN=k3s-client-ca@1621141073: notBefore=2021-05-16 04:57:53 +0000 UTC notAfter=2022-05-16 05:05:29 +0000 UTC"
2021-05-16T05:05:29.743269834Z time="2021-05-16T05:05:29.743103566Z" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
2021-05-16T05:05:29.743306120Z time="2021-05-16T05:05:29.743211177Z" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
2021-05-16T05:05:30.745224049Z time="2021-05-16T05:05:30.745113942Z" level=info msg="Containerd is now running"
2021-05-16T05:05:30.749774235Z time="2021-05-16T05:05:30.749667049Z" level=info msg="Connecting to proxy" url="wss://127.0.0.1:6443/v1-k3s/connect"
2021-05-16T05:05:30.750983191Z time="2021-05-16T05:05:30.750914236Z" level=info msg="Handling backend connection request [k3d-vald-cluster-server-0]"
2021-05-16T05:05:30.751362185Z time="2021-05-16T05:05:30.751272040Z" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --cni-bin-dir=/bin --cni-conf-dir=/var/lib/rancher/k3s/agent/etc/cni/net.d --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --container-runtime=remote --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=k3d-vald-cluster-server-0 --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --node-labels= --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/tmp/k3s-resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
2021-05-16T05:05:30.751655452Z time="2021-05-16T05:05:30.751584847Z" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --healthz-bind-address=127.0.0.1 --hostname-override=k3d-vald-cluster-server-0 --kubeconfig=/var/lib/rancher/k3s/agent/kubeproxy.kubeconfig --proxy-mode=iptables"
2021-05-16T05:05:30.751837779Z Flag --cloud-provider has been deprecated, will be removed in 1.23, in favor of removing cloud provider code from Kubelet.
2021-05-16T05:05:30.751843191Z Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
2021-05-16T05:05:30.751929721Z W0516 05:05:30.751853 7 server.go:226] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
2021-05-16T05:05:30.752339535Z I0516 05:05:30.752269 7 server.go:412] Version: v1.20.6+k3s1
2021-05-16T05:05:30.752347133Z W0516 05:05:30.752286 7 proxier.go:651] Failed to read file /lib/modules/5.12.3-zen1-1-zen/modules.builtin with error open /lib/modules/5.12.3-zen1-1-zen/modules.builtin: no such file or directory. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2021-05-16T05:05:30.752714692Z W0516 05:05:30.752674 7 proxier.go:661] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2021-05-16T05:05:30.752975127Z W0516 05:05:30.752944 7 proxier.go:661] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2021-05-16T05:05:30.753328453Z W0516 05:05:30.753198 7 proxier.go:661] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2021-05-16T05:05:30.753597919Z W0516 05:05:30.753517 7 proxier.go:661] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2021-05-16T05:05:30.753896991Z W0516 05:05:30.753828 7 proxier.go:661] Failed to load kernel module nf_conntrack with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2021-05-16T05:05:30.756976987Z time="2021-05-16T05:05:30.756939289Z" level=info msg="Waiting for node k3d-vald-cluster-server-0 CIDR not assigned yet"
2021-05-16T05:05:30.757863082Z E0516 05:05:30.757804 7 node.go:161] Failed to retrieve node info: nodes "k3d-vald-cluster-server-0" is forbidden: User "system:kube-proxy" cannot get resource "nodes" in API group "" at the cluster scope
2021-05-16T05:05:30.766593282Z I0516 05:05:30.766529 7 dynamic_cafile_content.go:167] Starting client-ca-bundle::/var/lib/rancher/k3s/agent/client-ca.crt
2021-05-16T05:05:30.766603912Z W0516 05:05:30.766573 7 manager.go:159] Cannot detect current cgroup on cgroup v2
2021-05-16T05:05:30.826258097Z I0516 05:05:30.826155 7 dynamic_cafile_content.go:167] Starting request-header::/var/lib/rancher/k3s/server/tls/request-header-ca.crt
2021-05-16T05:05:30.826278544Z I0516 05:05:30.826163 7 dynamic_cafile_content.go:167] Starting client-ca-bundle::/var/lib/rancher/k3s/server/tls/client-ca.crt
2021-05-16T05:05:30.826402499Z I0516 05:05:30.826374 7 dynamic_serving_content.go:130] Starting serving-cert::/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt::/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key
2021-05-16T05:05:30.826668880Z I0516 05:05:30.826616 7 secure_serving.go:197] Serving securely on 127.0.0.1:6444
2021-05-16T05:05:30.826678699Z I0516 05:05:30.826636 7 autoregister_controller.go:141] Starting autoregister controller
2021-05-16T05:05:30.826681363Z I0516 05:05:30.826640 7 cache.go:32] Waiting for caches to sync for autoregister controller
2021-05-16T05:05:30.826683407Z I0516 05:05:30.826649 7 tlsconfig.go:240] Starting DynamicServingCertificateController
2021-05-16T05:05:30.826759615Z I0516 05:05:30.826736 7 available_controller.go:475] Starting AvailableConditionController
2021-05-16T05:05:30.826764180Z I0516 05:05:30.826742 7 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
2021-05-16T05:05:30.826770368Z I0516 05:05:30.826754 7 controller.go:83] Starting OpenAPI AggregationController
2021-05-16T05:05:30.826776183Z I0516 05:05:30.826761 7 apf_controller.go:261] Starting API Priority and Fairness config controller
2021-05-16T05:05:30.826937152Z I0516 05:05:30.826838 7 dynamic_serving_content.go:130] Starting aggregator-proxy-cert::/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt::/var/lib/rancher/k3s/server/tls/client-auth-proxy.key
2021-05-16T05:05:30.826942590Z I0516 05:05:30.826867 7 apiservice_controller.go:97] Starting APIServiceRegistrationController
2021-05-16T05:05:30.826945390Z I0516 05:05:30.826872 7 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller
2021-05-16T05:05:30.826970033Z I0516 05:05:30.826937 7 crdregistration_controller.go:111] Starting crd-autoregister controller
2021-05-16T05:05:30.826978302Z I0516 05:05:30.826945 7 shared_informer.go:240] Waiting for caches to sync for crd-autoregister
2021-05-16T05:05:30.827001376Z I0516 05:05:30.826966 7 customresource_discovery_controller.go:209] Starting DiscoveryController
2021-05-16T05:05:30.827061951Z I0516 05:05:30.827035 7 controller.go:86] Starting OpenAPI controller
2021-05-16T05:05:30.827105498Z I0516 05:05:30.827068 7 cluster_authentication_trust_controller.go:440] Starting cluster_authentication_trust_controller controller
2021-05-16T05:05:30.827117880Z I0516 05:05:30.827084 7 shared_informer.go:240] Waiting for caches to sync for cluster_authentication_trust_controller
2021-05-16T05:05:30.827144380Z I0516 05:05:30.827123 7 dynamic_cafile_content.go:167] Starting client-ca-bundle::/var/lib/rancher/k3s/server/tls/client-ca.crt
2021-05-16T05:05:30.827176146Z I0516 05:05:30.827143 7 dynamic_cafile_content.go:167] Starting request-header::/var/lib/rancher/k3s/server/tls/request-header-ca.crt
2021-05-16T05:05:30.827208563Z I0516 05:05:30.827073 7 naming_controller.go:291] Starting NamingConditionController
2021-05-16T05:05:30.827256586Z I0516 05:05:30.827233 7 establishing_controller.go:76] Starting EstablishingController
2021-05-16T05:05:30.827308442Z I0516 05:05:30.827283 7 nonstructuralschema_controller.go:192] Starting NonStructuralSchemaConditionController
2021-05-16T05:05:30.827353842Z I0516 05:05:30.827330 7 apiapproval_controller.go:186] Starting KubernetesAPIApprovalPolicyConformantConditionController
2021-05-16T05:05:30.827395085Z I0516 05:05:30.827374 7 crd_finalizer.go:266] Starting CRDFinalizer
2021-05-16T05:05:30.831179268Z time="2021-05-16T05:05:30.831125056Z" level=info msg="Running cloud-controller-manager with args --profiling=false"
2021-05-16T05:05:30.833689279Z I0516 05:05:30.833640 7 controllermanager.go:141] Version: v1.20.6+k3s1
2021-05-16T05:05:30.840625877Z E0516 05:05:30.840565 7 controller.go:156] Unable to remove old endpoints from kubernetes service: no master IPs were listed in storage, refusing to erase all endpoints for the kubernetes service
2021-05-16T05:05:30.861097073Z I0516 05:05:30.860948 7 shared_informer.go:247] Caches are synced for node_authorizer
2021-05-16T05:05:30.927147030Z I0516 05:05:30.926809 7 cache.go:39] Caches are synced for autoregister controller
2021-05-16T05:05:30.927212193Z I0516 05:05:30.926844 7 apf_controller.go:266] Running API Priority and Fairness config worker
2021-05-16T05:05:30.927229303Z I0516 05:05:30.926892 7 cache.go:39] Caches are synced for AvailableConditionController controller
2021-05-16T05:05:30.927240386Z I0516 05:05:30.926923 7 cache.go:39] Caches are synced for APIServiceRegistrationController controller
2021-05-16T05:05:30.927394395Z I0516 05:05:30.927149 7 shared_informer.go:247] Caches are synced for crd-autoregister
2021-05-16T05:05:30.927418697Z I0516 05:05:30.927158 7 shared_informer.go:247] Caches are synced for cluster_authentication_trust_controller
2021-05-16T05:05:31.779966985Z I0516 05:05:31.779715 7 node.go:172] Successfully retrieved node IP: 192.168.9.2
2021-05-16T05:05:31.780031732Z I0516 05:05:31.779791 7 server_others.go:143] kube-proxy node IP is an IPv4 address (192.168.9.2), assume IPv4 operation
2021-05-16T05:05:31.781659505Z I0516 05:05:31.781468 7 server_others.go:186] Using iptables Proxier.
2021-05-16T05:05:31.782141843Z I0516 05:05:31.781934 7 server.go:650] Version: v1.20.6+k3s1
2021-05-16T05:05:31.783060048Z I0516 05:05:31.782851 7 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
2021-05-16T05:05:31.783130013Z F0516 05:05:31.782902 7 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
Which OS & Architecture
OS: Arch Linux x86_64
Host: 20QTCTO1WW ThinkPad P1 Gen 2
Kernel: 5.12.3-zen1-1-zen
Shell: zsh 5.8
CPU: Intel i9-9880H (16) @ 4.800GHz
GPU: Intel UHD Graphics 630
GPU: NVIDIA Quadro T2000 Mobile / Max-Q
Memory: 8385MiB / 63983MiB
Which version of k3d
k3d version
k3d version v4.4.3
k3s version v1.20.6-k3s1 (default)
Which version of docker
docker version
Client: Docker Engine - Community
Version: 20.10.0-rc2
API version: 1.41
Go version: go1.13.15
Git commit: dca98e3
Built: Wed Dec 2 22:15:25 2020
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.6
API version: 1.41 (minimum version 1.12)
Go version: go1.16.3
Git commit: 8728dd246c
Built: Mon Apr 12 14:10:25 2021
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: v1.5.0
GitCommit: 8c906ff108ac28da23f69cc7b74f8e7a470d1df0.m
runc:
Version: 1.0.0-rc94
GitCommit: 2c7861bc5e1b3e756392236553ec14a78a09f8bf
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Build with BuildKit (Docker Inc., v0.5.1)
compose: Docker Compose (Docker Inc., 2.0.0-beta.1)
Server:
Containers: 8
Running: 2
Paused: 0
Stopped: 6
Images: 34
Server Version: 20.10.6
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: false
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runsc runu io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 8c906ff108ac28da23f69cc7b74f8e7a470d1df0.m
runc version: 2c7861bc5e1b3e756392236553ec14a78a09f8bf
init version: de40ad0
Security Options:
seccomp
Profile: default
cgroupns
Kernel Version: 5.12.3-zen1-1-zen
Operating System: Arch Linux
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.48GiB
Name: archpango
ID: VRVW:GWHR:MEGI:BZLM:CPYT:RLPQ:OGWK:Q5KP:UM4E:4ERY:CSDK:K5YQ
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
Default Address Pools:
Base: 192.168.0.0/16, Size: 24
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 19 (7 by maintainers)
Commits related to this issue
- docs: add faq entry on 'nf_conntrack_max: permission denied' issue (#607) — committed to k3d-io/k3d by iwilltry42 3 years ago
- commit d9e5c8529d478e5e4f8109ec0e00bb4404861722 Author: iwilltry42 <iwilltry42@gmail.com> Date: Thu May 20 07:57:22 2021 +0200 docs: add faq entry on 'nf_conntrack_max: permission denied' issue... — committed to k3d-io/k3d by iwilltry42 3 years ago
- docs: add faq entry on 'nf_conntrack_max: permission denied' issue (#607) — committed to k3d-io/k3d by iwilltry42 3 years ago
- docs: add faq entry on 'nf_conntrack_max: permission denied' issue (#607) — committed to k3d-io/k3d by iwilltry42 3 years ago
- commit d0419f205755d7e524fb9eaa3586d10c9dca1762 Author: iwilltry42 <iwilltry42@gmail.com> Date: Thu May 20 07:57:22 2021 +0200 docs: add faq entry on 'nf_conntrack_max: permission denied' issue... — committed to k3d-io/k3d by iwilltry42 3 years ago
Yes, some Linux distributions backported the fix into earlier kernel versions. It was only present in mainline linux 5.12.2 and 5.11.19. You’d have to check with the Debian folks to see which releases they backported it to. Either way, there’s a fix available. Just use newer K3s.
I’ve added a new FAQ entry on this issue: https://k3d.io/faq/faq/#nodes-fail-to-start-or-get-stuck-in-notready-state-with-log-nf_conntrack_max-permission-denied
Also, thanks to #612 we quickly checked, that (obviously) also other kernel lines are affected, like 5.11 as of 5.11.19.
Hi, I just faced this issue and the solution proposed by @iwilltry42 is working perfectly!
Same error on kernel 5.12.4
Quick workaround : https://github.com/kubernetes-sigs/kind/issues/2240#issuecomment-838510890
k3d v4.4.5 hit AUR last night…
Just finished my test setup:
k3d cluster create --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0" --image rancher/k3s:v1.20.6-k3s(before:export K3D_FIX_CGROUPV2=trueas the system is on cgroupv2)5.12.3-1-defaultThis works just fine 👍
@nemonik can you try this please?
Hi @kpango , thanks for opening this issue and especially for doing some investigation already 👍 Similar issue was already mentioned before in #604 (closed in favor of this issue). Let’s see what we can do here 👍
I confirm that upgrading to
k3d4.4.5solved the issue for me, thanks all 😃Danke schön @iwilltry42
I’m currently setting up a test system, but I guess that given the workaround mentioned in the
kindissue, this could work here:k3d cluster create --k3s-server-arg "--kube-proxy-arg=conntrack-max-per-core=0" --k3s-agent-arg "--kube-proxy-arg=conntrack-max-per-core=0"(https://rancher.com/docs/k3s/latest/en/installation/install-options/agent-config/ & https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/ & https://github.com/kubernetes-sigs/kind/pull/2241)