k3s: ingress-nginx-controller fails to start on SLES 15 SP2

Environmental Info: K3s Version: k3s version v1.19.5+k3s1 (b11612e2) (and should fail on other versions but did not validate to be sure) RKE2 Version: v1.18.13+rke2r1 and v1.19.5+rke2r1

Node(s) CPU architecture, OS, and Version:

Linux ip-172-31-24-244 5.3.18-24.37-default #1 SMP Wed Nov 4 09:38:41 UTC 2020 (c145e08) x86_64 x86_64 x86_64 GNU/Linux Created in AWS using ami: ami-0f052119b3c7e61d1

# cat /etc/os-release
NAME="SLES"
VERSION="15-SP2"
VERSION_ID="15.2"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP2"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp2"
VARIANT_ID="sles-sap"

Cluster Configuration:

Single node

Describe the bug:

Exactly the same issue as described in https://github.com/kubernetes/ingress-nginx/issues/5991. However, I’m not convinced this is solely an AppArmor issue as disabling it (including ensuring it does not disable on startup and rebooting the node entirely) does not seem to resolve the issue. Also tried to do this in a fresh setup before installing either k3s or rke2 and saw the same results.

Steps To Reproduce:

K3s steps:

  1. Install k3s
  2. Attempt to install ingress-nginx-controller: kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.34.1/deploy/static/provider/aws/deploy.yaml
  3. Noticeingress-nginx-controller pod stuck on 0/1 Running

Rke2 steps:

  1. Install rke2
  2. Notice rke2-ingress-nginx-controller pod stuck at 0/1 Running

Expected behavior:

pod should come up and be 1/1 Running with no restarts and no failures in its logs.

Actual behavior:

See above

Additional context / logs:

Refer to linked issue

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (12 by maintainers)

Most upvoted comments

Validated in v1.18.17-rc2+rke2r1 as well.

This has been validated now in both rke2 using the latest beta releases and in k3s using the commit ids from 1.18, 1.19, and 1.20 branches (ids, respectively: 8c9608b26f07e96a7445f46eeb341a90a33b68a7, e34ffdba565067bce8a9da5b79d226f2aad7de4c, and 1d85a6a30a781ba8644c8f7e3ceec3cc0d21238b).

Deploying the yaml mentioned in the issue correctly deploys ingress-nginx-controller now, but there is a port conflict on the svclb. Editing that daemonset to use different ports correctly deploys that as well. The next release of rke2 and k3s should work fine on SLES.

It appears that the proper fix for this is in https://github.com/containerd/containerd/pull/4467 - I’m going to take a shot at pulling it into our containerd fork.

@rancher-max can you try following the instructions at https://documentation.suse.com/sles/15-SP1/html/SLES-all/cha-security-policykit.html#sec-security-policykit-policies-default to set the profile to standard on one of the hosts that were failing? If this is indeed what’s breaking RKE2 then we just need to document that step as a prerequisite.

This hints that there could be worse symptoms if explored more deeply since the API server is also being blocked from killing other container processes.

This explains an issue I had the other day with k3s on sles where Pods were stuck in Terminating.

I’ve set this up in my own lab with SLES 15 SP2 I have updated all SLES 15 SP2 hosts with the latest updates as of Jan. 18th 2021.

Error in logs: rke2-ingress-nginx-controller 2021/01/09 00:24:53 [notice] 340#340: signal process started
rke2-ingress-nginx-controller 2021/01/09 00:24:53 [alert] 340#340: kill(25, 1) failed (13: Permission denied)
rke2-ingress-nginx-controller nginx: [alert] kill(25, 1) failed (13: Permission denied)
rke2-ingress-nginx-controller W0109 00:24:53.893819 8 queue.go:130] requeuing kube-system/rke2-ingress-nginx-default-backend, err exit status 1
rke2-ingress-nginx-controller 2021/01/09 00:24:53 [notice] 340#340: signal process started
rke2-ingress-nginx-controller 2021/01/09 00:24:53 [alert] 340#340: kill(25, 1) failed (13: Permission denied)
rke2-ingress-nginx-controller nginx: [alert] kill(25, 1) failed (13: Permission denied)
rke2-ingress-nginx-controller I0109 00:24:57.144680 8 controller.go:137] Configuration changes detected, backend reload required.
rke2-ingress-nginx-controller E0109 00:24:57.216879 8 controller.go:149] Unexpected failure reloading the backend:

going through kernel module requirements right now.