rke: Canal containers give selinux related error message
RKE version: 0.3.0
Docker version: (docker version,docker info preferred)
Client: Docker Engine - Community
Version: 19.03.3
API version: 1.39 (downgraded from 1.40)
Go version: go1.12.10
Git commit: a872fc2f86
Built: Tue Oct 8 00:58:10 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.1
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 4c52b90
Built: Wed Jan 9 19:06:30 2019
OS/Arch: linux/amd64
Experimental: false
Docker daemon.json:
{
"selinux-enabled": true,
"userland-proxy": false,
"bip": "10.10.0.1/24",
"fixed-cidr": "10.10.0.1/24"
}
Operating system and kernel: (cat /etc/os-release, uname -r preferred)
NAME=“Red Hat Enterprise Linux”
VERSION=“8.0 (Ootpa)”
ID=“rhel”
ID_LIKE=“fedora”
VERSION_ID=“8.0”
PLATFORM_ID=“platform:el8”
PRETTY_NAME=“Red Hat Enterprise Linux 8.0 (Ootpa)”
ANSI_COLOR=“0;31”
CPE_NAME=“cpe:/o:redhat:enterprise_linux:8.0:GA”
HOME_URL=“https://www.redhat.com/”
BUG_REPORT_URL=“https://bugzilla.redhat.com/”
REDHAT_BUGZILLA_PRODUCT=“Red Hat Enterprise Linux 8” REDHAT_BUGZILLA_PRODUCT_VERSION=8.0 REDHAT_SUPPORT_PRODUCT=“Red Hat Enterprise Linux” REDHAT_SUPPORT_PRODUCT_VERSION=“8.0”
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) Doesn’t matter
cluster.yml file:
cluster_name: name
nodes:
- address: node1
user: user
ssh_key_path: /home/user/.ssh/id_rsa
role: [controlplane,etcd,worker]
- address: node2
user: user
ssh_key_path: /home/user/.ssh/id_rsa
role: [controlplane,etcd,worker]
- address: node3
user: user
ssh_key_path: /home/user/.ssh/id_rsa
role: [controlplane,etcd,worker]
private_registries:
- url: internal-registry
is_default: true # All system images will be pulled using this registry.
services:
etcd:
snapshot: true
creation: 6h
retention: 24h
Steps to Reproduce:
rke up
When the cluster is built, I see problemens with canal pods:
kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
canal-9vg2d 1/2 Running 0 45h
canal-ftfrv 0/2 Init:CrashLoopBackOff 197 16h
canal-l5g2d 2/2 Running 0 147m
coredns-5c98fc7769-wbscd 0/1 CrashLoopBackOff 487 45h
coredns-autoscaler-64c857cf7-qgqwc 1/1 Running 0 167m
metrics-server-7cf4dfc846-2vvbl 1/1 Running 34 167m
rke-coredns-addon-deploy-job-kn952 0/1 Completed 0 45h
rke-ingress-controller-deploy-job-f29cv 0/1 Completed 0 45h
rke-metrics-addon-deploy-job-hfsxx 0/1 Completed 0 45h
rke-network-plugin-deploy-job-lfnj4 0/1 Completed 0 45h
Looking into the cni-install pod, I see this error message:
mv: inter-device move failed: '/calico.conf.tmp' to '/host/etc/cni/net.d/10-canal.conflist'; unable to remove target: Permission denied
Failed to mv files. This may be caused by selinux configuration on the host, or something else.
Results: Cluster doesn’t work properly. Setting selinux to permissive is not really an option.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 15 (6 by maintainers)
From the discussion in projectcalico/calico#2704 it seems that
is needed in order to properly handle SELinux systems. Thus, I edited the running canal daemonset with
kubectl -n kube-system edit daemonset/canaland added those lines to the init container namedinstall-cni.After saving, the pods immediately reached the running state, and no more errors were logged. Maybe this suggests that those lines are missing in the template?
Success using v1.1.0-rc11 and K8s1.15.10-rancher1-2 on CentOS 7.7 with enforcing SELinux! Note, however:
rancher/coredns-coredns:1.6.2, instead ofrancher/coredns-coredns:1.3.1because the latter was failing with an error (pod logs reported that the--nodelabeloption was incorrect, and I assumed that it was introduced later.)calico_flexvolandcanal_flexvolimages in theconfig.ymlbecause the nodes were trying to get them from the Internet, not sure why (that failed because this is an air-gapped setup.) I used the values from v1.16.7-rancher1-2.Next test is upgrading from 1.15.5 to 1.15.10, and will report back in this very comment to avoid further noise.
EDIT: A cluster upgrade into 1.15.10 from 1.15.5 was successful! The canal pods are privileged and running properly.
@superseb I’m not clear what I should test. I mean… should I use rke v1.1.0-rc11? If that’s the case, should I test against one of that version’s supported K8s?
EDIT: based on the previous comment, I will test with v1.1.0-rc11 and K8s1.15.10-rancher1-2. The operating system is CentOS 7.7, completely updated, with SELinux enabled and enforcing. This will take some time because all my setups are air-gapped and I have to prime the internal registry.
@leodotcloud I have tried the fix above in another different cluster, and it seems to work.
Whil trying to reproduce the problem using a couple of different cloud providers, I see that
ip_tablesmodule is not loaded by default in RHEL8/CentOS 8 VMs.This is causing problems with the install. Running
modprobe ip_tablesenables the modules and the installation goes through fine with ‘Enforcing’ setting.@nheinemans and @carloscarnero could you check if this step resolves your problem?