amazon-vpc-cni-k8s: Failing with SELinux Enabled

What happened: RHEL 7 image with the EKS binaries installed. When joining the instance to the cluster the aws-node init container successfully runs because it runs in privileged mode. The long-running daemon container fails to move and place files in the host directory.

# aws-node logs
"level":"info","ts":"2020-11-19T21:03:44.454Z","caller":"entrypoint.sh","msg":"Install CNI binary.."}
install: cannot remove '/host/opt/cni/bin/aws-cni': Permission denied

What you expected to happen:

# aws-node logs with privileged mode or running on non-selinux host
{"level":"info","ts":"2020-11-19T20:51:21.267Z","caller":"entrypoint.sh","msg":"Install CNI binary.."}
{"level":"info","ts":"2020-11-19T20:51:21.279Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}{"level":"info","ts":"2020-11-19T20:51:21.281Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}{"level":"info","ts":"2020-11-19T20:51:23.299Z","caller":"entrypoint.sh","msg":"Copying config file ... "}
{"level":"info","ts":"2020-11-19T20:51:23.302Z","caller":"entrypoint.sh","msg":"Successfully copied CNI plugin binary and config file."}{"level":"info","ts":"2020-11-19T20:51:23.303Z","caller":"entrypoint.sh","msg":"Foregrounding IPAM daemon ..."}

How to reproduce it (as minimally and precisely as possible):

$ sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      31

Anything else we need to know?: When running in privileged mode, the daemon set functions properly.

securityContext:
  privileged: true
  capabilities:
    add:
      - NET_ADMIN

If the mounted host directories are configured with the container_file_t label, then the CNI is able copy the files but is never able to communicate with the ipam-D agent:

{"level":"info","ts":"2020-11-19T20:51:21.267Z","caller":"entrypoint.sh","msg":"Install CNI binary.."}
{"level":"info","ts":"2020-11-19T20:51:21.279Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}{"level":"info","ts":"2020-11-19T20:51:21.281Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}

Environment:

  • Kubernetes version: v1.18.8-eks-7c9bda
  • CNI Version: v0.8.6
  • OS: RHEL 7.9
  • Kernel: Linux ip-192-168-104-223.us-east-2.compute.internal 3.10.0-1160.6.1.el7.x86_64

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

Quick update: I am able to replicate the issue on AL2 with selinux-enabled on docker daemon.

Things that I have done so far to replicate the linux on AL2:

sudo amazon-linux-extras install selinux-ng
[OPTIONALLY FOR CONTAINER SUPPORT] sudo yum install container-selinux
Edit /etc/selinux/config to set SELINUX=enforcing (or targeted as desired)
vi /etc/docker/daemon.json and add `"selinux-enabled": true` to the file
Reboot

After that I see aws-node daemonSet crash with permissions issue.

[ec2-user@ip-10-0-1-130 ~]$ kn logs aws-node-tdkvt
{"level":"info","ts":"2021-02-03T16:06:30.680Z","caller":"entrypoint.sh","msg":"Install CNI binary.."}
install: cannot remove '/host/opt/cni/bin/aws-cni': Permission denied

Here, the host SELinux config isn’t to blame. The aws-node pod starts and as part of entrypoint script, we install the CNI binary and also copy 10-awslist.conf file and the written file carries the SELinux context of that container (random MCS pair). This is causing container to start as the host MCS pair is associated to a different user/group (probably)?

Quick work around this is to set spc_t selinuxOptions or run containers as privileged: true

securityContext:	         
    seLinuxOptions:	         
         type: spc_t

However, If you run as spc_t by default you’ll break current Bottlerocket releases since we don’t define that label.

very roughly SELinux is about answering questions like “can <subject> do <action> to <object>?” Where subjects are labels like container_t and objects are files like container_file_t. So the issue is that your CNI pod has the container_t subject label and is trying to create (or move into) a file in a directory with the object label usr_t or something like that. And the policy is set up so container_t can’t do most file actions on usr_t. spc_t fixes it by changing the subject to one that’s allowed to do most actions. Relabeling the directory so it’s container_file_t instead of usr_t fixes it by changing the object to one that container_t subjects are allowed to act on.

I am still working on the issue (with internal teams) to come up with better solution and update next steps here!

Our issue stemmed from comparing 2 “like” nodes and getting different results. Both were using SELinux enforcing, but we failed to catch that one node had selinux-enabled in docker and the other did not (I should have read @nithu0115 's response more closely on how he replicated). We are good now.

@nithu0115 - Reopening the issue since it is seen with AL2 extra.