fluent-bit: out_cloudwatch: Segmentation fault
Bug Report
Describe the bug
fluent-bit
immediately crashes with a segmentation fault when cloudwatch_logs
output is used on ARM64.
To Reproduce
Configure fluent-bit
to output to cloudwatch_logs
.
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
DB /var/log/flb_kube.db
Parser docker
Docker_Mode On
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
Merge_Log_Key data
Keep_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude On
Buffer_Size 32k
[OUTPUT]
Name loki
Match *
host loki-gateway.monitoring.svc.cluster.local
port 80
auto_kubernetes_labels false
labels job=fluent-bit
label_map_path /etc/labelmap.json
remove_keys kubernetes
[OUTPUT]
Name cloudwatch_logs
Match *
region eu-west-1
log_group_name /aws/eks/fluentbit-cloudwatch/logs
log_stream_prefix fluentbit-
log_retention_days 30
auto_create_group true
The following error is logged on startup and fluent-bit
crashes:
[2022/12/23 14:06:57] [ info] [fluent bit] version=2.0.8, commit=9444fdc5ee, pid=1
[2022/12/23 14:06:57] [ info] [storage] ver=1.4.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2022/12/23 14:06:57] [ info] [cmetrics] version=0.5.8
[2022/12/23 14:06:57] [ info] [ctraces ] version=0.2.7
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] initializing
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2022/12/23 14:06:57] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc.cluster.local port=443
[2022/12/23 14:06:57] [ info] [filter:kubernetes:kubernetes.0] token updated
[2022/12/23 14:06:57] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/12/23 14:06:57] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/12/23 14:06:57] [ info] [filter:kubernetes:kubernetes.0] connectivity OK
[2022/12/23 14:06:57] [ info] [output:loki:loki.0] configured, hostname=loki-gateway.monitoring.svc.cluster.local:80
[2022/12/23 14:06:57] [ info] [output:cloudwatch_logs:cloudwatch_logs.1] worker #0 started
[2022/12/23 14:06:57] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2022/12/23 14:06:57] [ info] [sp] stream processor started
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=528997 watch_fd=2 name=/var/log/containers/aws-load-balancer-controller-5bfbd747bd-59v4h_kube-system_aws-load-balanc
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=134898 watch_fd=3 name=/var/log/containers/aws-load-balancer-controller-5bfbd747bd-vwxlb_kube-system_aws-load-balanc
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=134905 watch_fd=4 name=/var/log/containers/aws-load-balancer-controller-5bfbd747bd-vwxlb_kube-system_aws-load-balanc
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=266650 watch_fd=5 name=/var/log/containers/aws-node-bndp7_kube-system_aws-node-25e1ce4793370ef1b2386200c27bc6c0f2007
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=266644 watch_fd=6 name=/var/log/containers/aws-node-bndp7_kube-system_aws-vpc-cni-init-1798466d7955c1f09fa5be21ef65c
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=134906 watch_fd=7 name=/var/log/containers/cert-manager-cainjector-86f7f4749-kfndw_cert-manager_cert-manager-cainjec
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=134907 watch_fd=8 name=/var/log/containers/cert-manager-cainjector-86f7f4749-kfndw_cert-manager_cert-manager-cainjec
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=269677 watch_fd=9 name=/var/log/containers/cert-manager-webhook-66c85f8577-lhbfz_cert-manager_cert-manager-webhook-9
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=266446 watch_fd=10 name=/var/log/containers/cluster-proportional-autoscaler-coredns-6ccfb4d9b5-8v892_kube-system_clu
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=266427 watch_fd=11 name=/var/log/containers/coredns-68b4d7cb47-sk5lc_kube-system_coredns-65f7eea6cd0dd159aa4bcba818a
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=529055 watch_fd=12 name=/var/log/containers/ebs-csi-controller-ddd4f6984-225ng_kube-system_csi-attacher-5c87844421e8
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=529031 watch_fd=13 name=/var/log/containers/ebs-csi-controller-ddd4f6984-225ng_kube-system_csi-provisioner-ee9da4c02
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=529104 watch_fd=14 name=/var/log/containers/ebs-csi-controller-ddd4f6984-225ng_kube-system_csi-resizer-b8d5c1b1e0569
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=529078 watch_fd=15 name=/var/log/containers/ebs-csi-controller-ddd4f6984-225ng_kube-system_csi-snapshotter-280f7d129
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=528990 watch_fd=16 name=/var/log/containers/ebs-csi-controller-ddd4f6984-225ng_kube-system_ebs-plugin-09adedebe40a79
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=530317 watch_fd=17 name=/var/log/containers/ebs-csi-controller-ddd4f6984-225ng_kube-system_liveness-probe-5631541006
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=528598 watch_fd=18 name=/var/log/containers/ebs-csi-node-bzvwg_kube-system_ebs-plugin-5be936e176dc41bee8d7cfb3d9a358
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=528843 watch_fd=19 name=/var/log/containers/ebs-csi-node-bzvwg_kube-system_liveness-probe-ba8c6f18f5865fcaf56b061e93
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=528725 watch_fd=20 name=/var/log/containers/ebs-csi-node-bzvwg_kube-system_node-driver-registrar-7ebb2d392e1c655e89e
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=528542 watch_fd=22 name=/var/log/containers/fluent-bit-gkclz_monitoring_fluent-bit-b40b1d6a925d1bbfa8aae336fa00b37c8
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=7765 watch_fd=32 name=/var/log/containers/kube-proxy-qwmsq_kube-system_kube-proxy-8e9f8234db195425e6db081745b8fe6d57
[2022/12/23 14:06:57] [ info] [input:tail:tail.0] inotify_fs_add(): inode=528530 watch_fd=33 name=/var/log/containers/fluent-bit-gkclz_monitoring_fluent-bit-5df98c2f5e66a525ba1cd91b5ddcb6e2c
[2022/12/23 14:06:57] [ info] [output:cloudwatch_logs:cloudwatch_logs.1] Creating log stream fluentbit-kube.var.log.containers.fluent-bit-gkclz_monitoring_fluent-bit-5df98c2f5e66a525ba1cd91b
[2022/12/23 14:06:57] [ info] [output:cloudwatch_logs:cloudwatch_logs.1] Created log stream fluentbit-kube.var.log.containers.fluent-bit-gkclz_monitoring_fluent-bit-5df98c2f5e66a525ba1cd91b5
[2022/12/23 14:06:57] [engine] caught signal (SIGSEGV)
#0 0xaaaab52df2b0 in template_execute() at lib/msgpack-c/include/msgpack/unpack_template.h:172
#1 0xaaaab52e0f6b in msgpack_unpack_next() at lib/msgpack-c/src/unpack.c:677
#2 0xaaaab4c7ab0b in process_and_send() at plugins/out_cloudwatch_logs/cloudwatch_api.c:859
#3 0xaaaab4c741bf in cb_cloudwatch_flush() at plugins/out_cloudwatch_logs/cloudwatch_logs.c:417
#4 0xaaaab48a860f in output_pre_cb_flush() at include/fluent-bit/flb_output.h:527
#5 0xaaaab52fc0ef in co_switch() at lib/monkey/deps/flb_libco/aarch64.c:133
#6 0xffffffffffffffff in ???() at ???:0
Stream closed EOF for monitoring/fluent-bit-gkclz (fluent-bit)
Expected behavior
The application should successfully start and start logging to CloudWatch.
Your Environment
- Version used: 2.0.8
- Configuration: See above.
- Environment name and version (e.g. Kubernetes? What version?): Kubernetes
v1.24.7-eks-fb459a0
- Server type and version:
arm64
- Operating System and version: Bottlerocket OS 1.11.1 (aws-k8s-1.24)
- Filters and plugins:
kubernetes
,cloudwatch_logs
Additional context
There is another issue logged with a similar error. My expectation was that this would be resolved by version 2.0.8 that includes the fix, but it seems that is not the case. See https://github.com/fluent/fluent-bit/issues/6451
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 24 (12 by maintainers)
@cdancy actually, the issue seems to be cause simply by having any two outputs match the same tag. We’ve also discovered another issue on 1.9 (not sure if it applies to 2.0) which is much more likely to occur if you have
net.keepalive On
, the default.The stack trace in this issue matches the “two outputs match same tag” issue we are still investigating.
Sorry that’s all I can say right now. Will post more when I have more clear info.
I want to note here that we think we’ve found a potential issue when you have multiple cloudwatch_logs outputs matching the same tag. Not sure which versions this impacts yet.
@cdancy AWS distro is focused on AWS customers and thus yes, is always the most up to date distro in terms of fixes for AWS customers.
We’re distributing the same code, the only thing we add is the old AWS go plugins. What do you mean? https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#aws-go-plugins-vs-aws-core-c-plugins
@cdancy I have a pre-release build of S3 concurrency fix here: https://github.com/aws/aws-for-fluent-bit/issues/495#issuecomment-1356478537
Keep checking AWS distro, we will release it very very soon.
If you can repro it with an AWS distro version and can open an issue at our repo, that will also help with tracking
@cdancy thanks for letting me know.
I see this:
Similar to the trace already in this issue. I am wondering if its related to the S3 concurrency issues we recently discovered: https://github.com/fluent/fluent-bit/pull/6573
This will be released in AWS4FB 2.31.0 very soon, not sure when it will get in this distro: https://github.com/aws/aws-for-fluent-bit/releases