rke: Canal containers give selinux related error message

RKE version: 0.3.0

Docker version: (docker version,docker info preferred)

Client: Docker Engine - Community
 Version:           19.03.3
 API version:       1.39 (downgraded from 1.40)
 Go version:        go1.12.10
 Git commit:        a872fc2f86
 Built:             Tue Oct  8 00:58:10 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.1
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       4c52b90
  Built:            Wed Jan  9 19:06:30 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Docker daemon.json:

{
  "selinux-enabled": true,
  "userland-proxy": false,
  "bip": "10.10.0.1/24",
  "fixed-cidr": "10.10.0.1/24"
}

Operating system and kernel: (cat /etc/os-release, uname -r preferred) NAME=“Red Hat Enterprise Linux” VERSION=“8.0 (Ootpa)” ID=“rhel” ID_LIKE=“fedora” VERSION_ID=“8.0” PLATFORM_ID=“platform:el8” PRETTY_NAME=“Red Hat Enterprise Linux 8.0 (Ootpa)” ANSI_COLOR=“0;31” CPE_NAME=“cpe:/o:redhat:enterprise_linux:8.0:GA” HOME_URL=“https://www.redhat.com/” BUG_REPORT_URL=“https://bugzilla.redhat.com/”

REDHAT_BUGZILLA_PRODUCT=“Red Hat Enterprise Linux 8” REDHAT_BUGZILLA_PRODUCT_VERSION=8.0 REDHAT_SUPPORT_PRODUCT=“Red Hat Enterprise Linux” REDHAT_SUPPORT_PRODUCT_VERSION=“8.0”

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) Doesn’t matter

cluster.yml file:

cluster_name: name

nodes:
  - address: node1
    user: user
    ssh_key_path: /home/user/.ssh/id_rsa
    role: [controlplane,etcd,worker]
  - address: node2
    user: user
    ssh_key_path: /home/user/.ssh/id_rsa
    role: [controlplane,etcd,worker]
  - address: node3
    user: user
    ssh_key_path: /home/user/.ssh/id_rsa
    role: [controlplane,etcd,worker]

private_registries:
  - url: internal-registry
    is_default: true # All system images will be pulled using this registry. 

services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h

Steps to Reproduce: rke up When the cluster is built, I see problemens with canal pods:

kubectl -n kube-system get pods
NAME                                      READY   STATUS                  RESTARTS   AGE
canal-9vg2d                               1/2     Running                 0          45h
canal-ftfrv                               0/2     Init:CrashLoopBackOff   197        16h
canal-l5g2d                               2/2     Running                 0          147m
coredns-5c98fc7769-wbscd                  0/1     CrashLoopBackOff        487        45h
coredns-autoscaler-64c857cf7-qgqwc        1/1     Running                 0          167m
metrics-server-7cf4dfc846-2vvbl           1/1     Running                 34         167m
rke-coredns-addon-deploy-job-kn952        0/1     Completed               0          45h
rke-ingress-controller-deploy-job-f29cv   0/1     Completed               0          45h
rke-metrics-addon-deploy-job-hfsxx        0/1     Completed               0          45h
rke-network-plugin-deploy-job-lfnj4       0/1     Completed               0          45h

Looking into the cni-install pod, I see this error message:

mv: inter-device move failed: '/calico.conf.tmp' to '/host/etc/cni/net.d/10-canal.conflist'; unable to remove target: Permission denied
Failed to mv files. This may be caused by selinux configuration on the host, or something else.

Results: Cluster doesn’t work properly. Setting selinux to permissive is not really an option.

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 1
Comments: 15 (6 by maintainers)

Most upvoted comments

From the discussion in projectcalico/calico#2704 it seems that

securityContext:
  privileged: true

is needed in order to properly handle SELinux systems. Thus, I edited the running canal daemonset with kubectl -n kube-system edit daemonset/canal and added those lines to the init container named install-cni.

After saving, the pods immediately reached the running state, and no more errors were logged. Maybe this suggests that those lines are missing in the template?

carloscarnero on Nov 2, 2019

Success using v1.1.0-rc11 and K8s1.15.10-rancher1-2 on CentOS 7.7 with enforcing SELinux! Note, however:

I had to use the CoreDNS images from v1.16.7-rancher1-2, rancher/coredns-coredns:1.6.2, instead of rancher/coredns-coredns:1.3.1 because the latter was failing with an error (pod logs reported that the --nodelabel option was incorrect, and I assumed that it was introduced later.)
I had to specify the calico_flexvol and canal_flexvol images in the config.yml because the nodes were trying to get them from the Internet, not sure why (that failed because this is an air-gapped setup.) I used the values from v1.16.7-rancher1-2.

Next test is upgrading from 1.15.5 to 1.15.10, and will report back in this very comment to avoid further noise.

EDIT: A cluster upgrade into 1.15.10 from 1.15.5 was successful! The canal pods are privileged and running properly.

carloscarnero on Mar 6, 2020

@carloscarnero If you can test this change on some lab machines which are identical to the ones that were exhibiting the problem, that would be appreciated

@superseb I’m not clear what I should test. I mean… should I use rke v1.1.0-rc11? If that’s the case, should I test against one of that version’s supported K8s?

EDIT: based on the previous comment, I will test with v1.1.0-rc11 and K8s1.15.10-rancher1-2. The operating system is CentOS 7.7, completely updated, with SELinux enabled and enforcing. This will take some time because all my setups are air-gapped and I have to prime the internal registry.

carloscarnero on Mar 6, 2020

@leodotcloud I have tried the fix above in another different cluster, and it seems to work.

carloscarnero on Nov 6, 2019

Whil trying to reproduce the problem using a couple of different cloud providers, I see that ip_tables module is not loaded by default in RHEL8/CentOS 8 VMs.

[root@ip-172-31-16-240 ~]# lsmod | grep ip_tables
[root@ip-172-31-16-240 ~]#

This is causing problems with the install. Running modprobe ip_tables enables the modules and the installation goes through fine with ‘Enforcing’ setting.

@nheinemans and @carloscarnero could you check if this step resolves your problem?

leodotcloud on Nov 1, 2019