falco: Crash with Invalid JSON error while parsing container info

What happened: Falco crashes with Runtime error: Invalid JSON encountered while parsing container info resulting in CrashLoopBackOff pod state

What you expected to happen:

  • Parse container info without error
  • Throw error and run without crash ( possible fallback? )

How to reproduce it (as minimally and precisely as possible):

  • Make a k8s deployment with large number of ports (> 1000)
  • Example nginx deployment [This is a dumb example configuration just to recreate the issue]
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deploy
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx
          ports:
          - { containerPort: 8080, name: server1}
          - { containerPort: 8081, name: server2}
          - { containerPort: 8082, name: server3}
          - { containerPort: 8083, name: server4}
          - { containerPort: 50000, hostPort: 50000, protocol: UDP, name: port1 }
          - { containerPort: 50001, hostPort: 50001, protocol: UDP, name: port2 }
          - { containerPort: 50002, hostPort: 50002, protocol: UDP, name: port3 }
          - { containerPort: 50003, hostPort: 50003, protocol: UDP, name: port4 }
          - { containerPort: 50004, hostPort: 50004, protocol: UDP, name: port5 }
          - { containerPort: 50005, hostPort: 50005, protocol: UDP, name: port6 }
          - { containerPort: 50006, hostPort: 50006, protocol: UDP, name: port7 }
          - { containerPort: 50007, hostPort: 50007, protocol: UDP, name: port8 }
          - { containerPort: 50008, hostPort: 50008, protocol: UDP, name: port9 }
          - { containerPort: 50009, hostPort: 50009, protocol: UDP, name: port10 }
          ...
          ...
          - { containerPort: 50998, hostPort: 50998, protocol: UDP, name: port999 }
  • Deploy Falco on the same node and check falco logs
  • fyi References,
    • We need to explicitly list all the ports as mentioned at https://github.com/kubernetes/kubernetes/issues/23864
    • Example: https://kubernetes.io/docs/tasks/run-application/run-stateless-application-deployment/

Anything else we need to know?:

  • values.yaml for parameters
ebpf:
  # Enable eBPF support for Falco - This allows Falco to run on Google COS.
  enabled: true

  settings:
    # Needed to enable eBPF JIT at runtime for performance reasons.
    # Can be skipped if eBPF JIT is enabled from outside the container
    hostNetwork: true
    # Needed to correctly detect the kernel version for the eBPF program
    # Set to false if not running on Google COS
    mountEtcVolume: true

falco:
  # Output format
  jsonOutput: true
  logLevel: notice
  # Slack alerts
  programOutput:
    enabled: true
    keepAlive: false
    program: "\" jq '{text: .output}' | curl -d @- -X POST https://hooks.slack.com/services/XXXX\""

Environment:

  • Falco version (use falco --version): falco version 0.15.3
  • System info
{
  "machine": "x86_64",
  "nodename": "gke-test-default-pool-3d67c0cd-n8b4",
  "release": "4.14.119+",
  "sysname": "Linux",
  "version": "#1 SMP Tue May 14 21:04:23 PDT 2019"
}
  • Cloud provider or hardware configuration: GCP
  • OS (e.g: cat /etc/os-release):
BUILD_ID=10895.242.0
NAME="Container-Optimized OS"
  • Kernel (e.g. uname -a):
Linux gke-test-default-pool-3d67c0cd-dlng 4.14.119+ #1 SMP Tue May 14 21:04:23 PDT 2019 x86_64 Intel(R) Xeon(R) CPU
 @ 2.20GHz GenuineIntel GNU/Linux
  • Install tools (e.g. in kubernetes, rpm, deb, from source): Kubernetes (helm)
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 3
  • Comments: 43 (15 by maintainers)

Most upvoted comments

Hi!
It seems like the specific issue outlined by @dza89 with her/his docker image was fixed in falco libs with commit https://github.com/falcosecurity/libs/tree/748485ac2e912cdb67e3a19bf6ff402a54d4f08a, that avoids storing LABEL lines with length > 100bytes.

There is still a bug that is not covered by the above commit: what if lots (i mean lots) of labels with strings length < 100 bytes are added to a docker image? I’ll tell you: falco still crashes. I am currently testing a possible fix.

You can easily reproduce the crash with the attached dockerfile (sorry for the stupid label keys/values 😃 ) Dockerfile.txt

This issue should be definitively fixed by https://github.com/falcosecurity/libs/pull/102 which is included in the latest development version of Falco (i.e. the source code in the master branch).

The fix will be also part of the next upcoming release, so /milestone 0.31.0

Since it has been fixed, I’m closing this issue. Feel free to discuss further or ask to re-open it if the problem persists. Also, any feedback about the fix will be really appreciated. 🙏

/close

Thank you @dza89, I was able to reproduce the bug now. It seems the root cause resides in libsinsp. I can confirm the problem occurs when parsing container metadata. It can happen even outside a K8s context.

I still need to investigate further. Meanwhile, I have opened a new issue https://github.com/falcosecurity/libs/issues/51 to track the problem in libsinsp.

PS In my opinion, https://github.com/falcosecurity/libs/issues/51 is not a dup of this issue since a temporary workaround for Falco only might be just reporting the error without exiting (not a definitive solution, ofc).

@leogr I’ve created a dummy image which let’s falco (28.1) crash: dza123/kotlin:latest

The issue is I think the total size of the labels, because i had to test it a few times before generating enough labels. This is default behaviour of buildpack btw, so please don’t blame me for the ridiculous amount of labels.

On our container platform we also have some containers running that were built with some kind of buildpack resulting in insanely huge labels on the docker images. And Falco crashes when trying to parse them. These labels are ridiculous but Falco should also be able to handle it in my opinion.

What is weird though is that this starting happening when we upgraded from 0.26.2 to 0.27.0. It’s running fine with 0.26.2. I couldn’t find a change in de changelog that could explain this?

@fntlnz @leodido Try the Docker image nebhale/spring-music that’s a typical Java Spring Docker image created by Poketo buildpacks. There is a lot of JSON in the labels. I think this will cause Falco to crash.

I’m able to reproduce If you have a pod with 62K character in annotations, when falco try to parse the container info, falco will crash The limit might be lower, but at least with 62K characters, i’m able to reproduce