falco: Crash with Invalid JSON error while parsing container info
What happened:
Falco crashes with Runtime error: Invalid JSON encountered while parsing container info resulting in CrashLoopBackOff pod state
What you expected to happen:
- Parse container info without error
- Throw error and run without crash ( possible fallback? )
How to reproduce it (as minimally and precisely as possible):
- Make a k8s deployment with large number of ports (> 1000)
- Example nginx deployment [This is a dumb example configuration just to recreate the issue]
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deploy
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- { containerPort: 8080, name: server1}
- { containerPort: 8081, name: server2}
- { containerPort: 8082, name: server3}
- { containerPort: 8083, name: server4}
- { containerPort: 50000, hostPort: 50000, protocol: UDP, name: port1 }
- { containerPort: 50001, hostPort: 50001, protocol: UDP, name: port2 }
- { containerPort: 50002, hostPort: 50002, protocol: UDP, name: port3 }
- { containerPort: 50003, hostPort: 50003, protocol: UDP, name: port4 }
- { containerPort: 50004, hostPort: 50004, protocol: UDP, name: port5 }
- { containerPort: 50005, hostPort: 50005, protocol: UDP, name: port6 }
- { containerPort: 50006, hostPort: 50006, protocol: UDP, name: port7 }
- { containerPort: 50007, hostPort: 50007, protocol: UDP, name: port8 }
- { containerPort: 50008, hostPort: 50008, protocol: UDP, name: port9 }
- { containerPort: 50009, hostPort: 50009, protocol: UDP, name: port10 }
...
...
- { containerPort: 50998, hostPort: 50998, protocol: UDP, name: port999 }
- Deploy Falco on the same node and check falco logs
- fyi References,
- We need to explicitly list all the ports as mentioned at
https://github.com/kubernetes/kubernetes/issues/23864 - Example:
https://kubernetes.io/docs/tasks/run-application/run-stateless-application-deployment/
- We need to explicitly list all the ports as mentioned at
Anything else we need to know?:
- values.yaml for parameters
ebpf:
# Enable eBPF support for Falco - This allows Falco to run on Google COS.
enabled: true
settings:
# Needed to enable eBPF JIT at runtime for performance reasons.
# Can be skipped if eBPF JIT is enabled from outside the container
hostNetwork: true
# Needed to correctly detect the kernel version for the eBPF program
# Set to false if not running on Google COS
mountEtcVolume: true
falco:
# Output format
jsonOutput: true
logLevel: notice
# Slack alerts
programOutput:
enabled: true
keepAlive: false
program: "\" jq '{text: .output}' | curl -d @- -X POST https://hooks.slack.com/services/XXXX\""
Environment:
- Falco version (use
falco --version):falco version 0.15.3 - System info
{
"machine": "x86_64",
"nodename": "gke-test-default-pool-3d67c0cd-n8b4",
"release": "4.14.119+",
"sysname": "Linux",
"version": "#1 SMP Tue May 14 21:04:23 PDT 2019"
}
- Cloud provider or hardware configuration:
GCP - OS (e.g:
cat /etc/os-release):
BUILD_ID=10895.242.0
NAME="Container-Optimized OS"
- Kernel (e.g.
uname -a):
Linux gke-test-default-pool-3d67c0cd-dlng 4.14.119+ #1 SMP Tue May 14 21:04:23 PDT 2019 x86_64 Intel(R) Xeon(R) CPU
@ 2.20GHz GenuineIntel GNU/Linux
- Install tools (e.g. in kubernetes, rpm, deb, from source):
Kubernetes (helm) - Others:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 43 (15 by maintainers)
Hi!
It seems like the specific issue outlined by @dza89 with her/his docker image was fixed in falco libs with commit https://github.com/falcosecurity/libs/tree/748485ac2e912cdb67e3a19bf6ff402a54d4f08a, that avoids storing LABEL lines with length > 100bytes.
There is still a bug that is not covered by the above commit: what if lots (i mean lots) of labels with strings length < 100 bytes are added to a docker image? I’ll tell you: falco still crashes. I am currently testing a possible fix.
You can easily reproduce the crash with the attached dockerfile (sorry for the stupid label keys/values 😃 ) Dockerfile.txt
This issue should be definitively fixed by https://github.com/falcosecurity/libs/pull/102 which is included in the latest development version of Falco (i.e. the source code in the
masterbranch).The fix will be also part of the next upcoming release, so /milestone 0.31.0
Since it has been fixed, I’m closing this issue. Feel free to discuss further or ask to re-open it if the problem persists. Also, any feedback about the fix will be really appreciated. 🙏
/close
Thank you @dza89, I was able to reproduce the bug now. It seems the root cause resides in
libsinsp. I can confirm the problem occurs when parsing container metadata. It can happen even outside a K8s context.I still need to investigate further. Meanwhile, I have opened a new issue https://github.com/falcosecurity/libs/issues/51 to track the problem in
libsinsp.PS In my opinion, https://github.com/falcosecurity/libs/issues/51 is not a dup of this issue since a temporary workaround for Falco only might be just reporting the error without exiting (not a definitive solution, ofc).
@leogr I’ve created a dummy image which let’s falco (28.1) crash: dza123/kotlin:latest
The issue is I think the total size of the labels, because i had to test it a few times before generating enough labels. This is default behaviour of buildpack btw, so please don’t blame me for the ridiculous amount of labels.
On our container platform we also have some containers running that were built with some kind of buildpack resulting in insanely huge labels on the docker images. And Falco crashes when trying to parse them. These labels are ridiculous but Falco should also be able to handle it in my opinion.
What is weird though is that this starting happening when we upgraded from 0.26.2 to 0.27.0. It’s running fine with 0.26.2. I couldn’t find a change in de changelog that could explain this?
@fntlnz @leodido Try the Docker image nebhale/spring-music that’s a typical Java Spring Docker image created by Poketo buildpacks. There is a lot of JSON in the labels. I think this will cause Falco to crash.
I’m able to reproduce If you have a pod with 62K character in annotations, when falco try to parse the container info, falco will crash The limit might be lower, but at least with 62K characters, i’m able to reproduce