k3s: ingress-nginx-controller fails to start on SLES 15 SP2
Environmental Info:
K3s Version:
k3s version v1.19.5+k3s1 (b11612e2) (and should fail on other versions but did not validate to be sure)
RKE2 Version:
v1.18.13+rke2r1 and v1.19.5+rke2r1
Node(s) CPU architecture, OS, and Version:
Linux ip-172-31-24-244 5.3.18-24.37-default #1 SMP Wed Nov 4 09:38:41 UTC 2020 (c145e08) x86_64 x86_64 x86_64 GNU/Linux
Created in AWS using ami: ami-0f052119b3c7e61d1
# cat /etc/os-release
NAME="SLES"
VERSION="15-SP2"
VERSION_ID="15.2"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP2"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp2"
VARIANT_ID="sles-sap"
Cluster Configuration:
Single node
Describe the bug:
Exactly the same issue as described in https://github.com/kubernetes/ingress-nginx/issues/5991. However, I’m not convinced this is solely an AppArmor issue as disabling it (including ensuring it does not disable on startup and rebooting the node entirely) does not seem to resolve the issue. Also tried to do this in a fresh setup before installing either k3s or rke2 and saw the same results.
Steps To Reproduce:
K3s steps:
- Install k3s
- Attempt to install ingress-nginx-controller:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.34.1/deploy/static/provider/aws/deploy.yaml - Notice
ingress-nginx-controllerpod stuck on0/1 Running
Rke2 steps:
- Install rke2
- Notice
rke2-ingress-nginx-controllerpod stuck at0/1 Running
Expected behavior:
pod should come up and be 1/1 Running with no restarts and no failures in its logs.
Actual behavior:
See above
Additional context / logs:
Refer to linked issue
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (12 by maintainers)
Validated in v1.18.17-rc2+rke2r1 as well.
This has been validated now in both rke2 using the latest beta releases and in k3s using the commit ids from 1.18, 1.19, and 1.20 branches (ids, respectively:
8c9608b26f07e96a7445f46eeb341a90a33b68a7,e34ffdba565067bce8a9da5b79d226f2aad7de4c, and1d85a6a30a781ba8644c8f7e3ceec3cc0d21238b).Deploying the yaml mentioned in the issue correctly deploys ingress-nginx-controller now, but there is a port conflict on the svclb. Editing that daemonset to use different ports correctly deploys that as well. The next release of rke2 and k3s should work fine on SLES.
It appears that the proper fix for this is in https://github.com/containerd/containerd/pull/4467 - I’m going to take a shot at pulling it into our containerd fork.
@rancher-max can you try following the instructions at https://documentation.suse.com/sles/15-SP1/html/SLES-all/cha-security-policykit.html#sec-security-policykit-policies-default to set the profile to
standardon one of the hosts that were failing? If this is indeed what’s breaking RKE2 then we just need to document that step as a prerequisite.This explains an issue I had the other day with k3s on sles where Pods were stuck in
Terminating.I’ve set this up in my own lab with SLES 15 SP2 I have updated all SLES 15 SP2 hosts with the latest updates as of Jan. 18th 2021.
Error in logs: rke2-ingress-nginx-controller 2021/01/09 00:24:53 [notice] 340#340: signal process started
rke2-ingress-nginx-controller 2021/01/09 00:24:53 [alert] 340#340: kill(25, 1) failed (13: Permission denied)
rke2-ingress-nginx-controller nginx: [alert] kill(25, 1) failed (13: Permission denied)
rke2-ingress-nginx-controller W0109 00:24:53.893819 8 queue.go:130] requeuing kube-system/rke2-ingress-nginx-default-backend, err exit status 1
rke2-ingress-nginx-controller 2021/01/09 00:24:53 [notice] 340#340: signal process started
rke2-ingress-nginx-controller 2021/01/09 00:24:53 [alert] 340#340: kill(25, 1) failed (13: Permission denied)
rke2-ingress-nginx-controller nginx: [alert] kill(25, 1) failed (13: Permission denied)
rke2-ingress-nginx-controller I0109 00:24:57.144680 8 controller.go:137] Configuration changes detected, backend reload required.
rke2-ingress-nginx-controller E0109 00:24:57.216879 8 controller.go:149] Unexpected failure reloading the backend:
going through kernel module requirements right now.