falco: Unable to detect events with containerd and kubernetes

Describe the bug

Hi everyone,

I’m not sure if it’s a bug or a misunderstanding from my side. I have a kubeadm cluster installed and i switch from docker to containerd because of the deprecation of docker.

I’ve installed falco directly on the worker node (not as a deamonset) as the documentation recommend. I’ve update the systemd unit file to add the reference needed when we use containerd. Everything is up and running but now, when I try to create a pod and try to exec a shell on it I have no entry in the log which tell me that someone has spanwed a shell in pod xxxx.

Do I miss something ? I do not find in the documentation a lot of documentation about how to use falco with containerd.

Here you can find my systemd unit file.

[Unit]
Description=Falco: Container Native Runtime Security
Documentation=https://falco.org/docs/
[Service]
Environment="FALCO_ARGS=--cri=/run/containerd/containerd.sock --disable-cri-async -pk"
Type=simple
User=root
ExecStartPre=/sbin/modprobe falco
ExecStart=/usr/bin/falco --pidfile=/var/run/falco.pid $FALCO_ARGS
ExecStopPost=/sbin/rmmod falco
UMask=0077
TimeoutSec=30
RestartSec=15s
Restart=on-failure
PrivateTmp=true
NoNewPrivileges=yes
ProtectHome=read-only
ProtectSystem=full
ProtectKernelTunables=true
RestrictRealtime=true
RestrictAddressFamilies=~AF_PACKET
[Install]
WantedBy=multi-user.target

When I try to do another action for example I open a shell on a pod and I try to create a file in /etc I found some entries in the logs but some data are missing

Apr 23 11:37:35 cks-worker falco[8815]: 11:37:35.770964137: Error File below /etc opened for writing (user=root user_loginuid=-1 command=touch /etc/bbb parent=bash pcmdline=bash file=/etc/bbb program=touch gparent=<NA> ggparent=<NA> gggparent=<NA
> container_id=host image=<NA>)
Apr 23 11:37:35 cks-worker falco[8815]: 11:37:35.770964137: Error File below /etc opened for writing (user=root user_loginuid=-1 command=touch /etc/bbb parent=bash pcmdline=bash file=/etc/bbb program=touch gparent=<NA> ggparent=<NA> gggparent=<NA
> container_id=host image=<NA>)

Additional info: I’m using the default rules file.

How to reproduce it

  • Install falco in the worker node
  • Update the systemd unit file to include configuration related to containerd
  • Create a pod
apiVersion: v1
kind: Pod
metadata:
  name: falco-test
  namespace: default
spec:
  containers:
  - image: nginx
    name: falco-ltest
  • exec a shell on the pod
kubectl exec -it falco-test -- bash
  • check the falco logs journalctl -fu falco (No events when a shell has been spawned)
  • Inside the pod create a file in /etc and check the logs again (Events with incomplete data)

Expected behaviour

  • We should have a event when we open a shell in the pod. It’s working when we are using docker as container runtime.
  • We should have events with all the metadata (pod_name…).

Environment

  • Falco version:
Falco version: 0.28.0
Driver version: 5c0b863ddade7a45568c0ac97d037422c9efb750
  • System info:
  • Cloud provider : GCP
  • OS:
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Kernel: Linux cks-master 5.4.0-1042-gcp #45~18.04.1-Ubuntu SMP Tue Apr 13 18:51:16 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

  • Installation method: DEB

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18 (6 by maintainers)

Most upvoted comments

It looks like there are multiple issues in this case, so I tried to summarize them as below. Hope this will help other people troubleshoot missing metadata issue in their falco events.

  1. container_id=host. Other than missing metadata, if you also see container_id=host even when you trigger the event from a container, you probably hit this issue. This could be caused by non-default cgroup path. Falco parses a process’ cgroup path in order to retrieve its container ID and then use the ID to query container runtime like docker, containerd and kubernetes. There are some discussion and potential fix in this issue about this cgroup path problem. https://github.com/falcosecurity/falco/issues/1568 When you see this, the result of cat /proc/self/cgroup from a container will help people to troubleshoot further.

  2. Missing container.image.* If you see a valid container ID (like container_id=8ae22495737c), but have no metadata about its container image, you may need to check if you have the right socket path setup. You can check and specify the correct path in helm charts overrides or falco command line. For example:

containerd:
  enabled: true
  socket: /xxxx/containerd.sock

Providing grpcurl result from falco container will also be very helpful. You can find the step here: https://github.com/falcosecurity/falco/issues/1568#issuecomment-796897481

  1. kubernetes API server connection problem. If you can see a valid container ID but not those with k8s prefix, you may want to check if falco is allowed to connect to your kubernetes API server.
  2. If you can see all metadata and just have problem to trigger a rule with container.privileged condition after containerd/crio is used, this was fixed in https://github.com/falcosecurity/libs/pull/79
  3. If you see this situation only when an event is triggered right after a pod start/restart, it seems to be a known issue. Here is one of the thread for it: https://kubernetes.slack.com/archives/CMWH3EH32/p1629895515219900

Edit: Add case 5 as it’s a common case too.

AFAIK, this issue should be fixed in libs by https://github.com/falcosecurity/libs/pull/79, but the version of the lib has been not yet upgraded in Falco, so keep it open.

Thank you, for the detailed report @holyspectral Very appreciated 👍

Thanks for the list @holyspectral ! For me it was number 1. After setting SystemdCgroup = true in /etc/containerd/config.toml, like explained in https://github.com/falcosecurity/falco/issues/1568, and after restarting containerd and containers, Falco is getting the correct container_id and metadata

I noticed this issue too when containerd or crio is used. It looks like the logic in falco-libs doesn’t support the recent containerd and crio . I’m going to work on a fix.