fluent-bit: Broken pipe error sending logs to datadog

Bug Report

Describe the bug i’m running my services in AWS Fargate with a datadog agent sidecar and a fluent-bit sidecar that forwards logs to the ddg agent, as prescribed by datadog

This used to work fine, but in the last few days I see occasional crashes of the fluent-bit container with exit code 139, which subsequently crashes my service as well.

I’m using 906394416424.dkr.ecr.eu-west-1.amazonaws.com/aws-for-fluent-bit:latest, so I assume it’s related to a recent version.

To Reproduce This gist contains my task definition that had the issue. The crash happens about once a day at random times

Screenshots Logs before the crash

2020-07-23 09:51:07[2020/07/23 06:51:07] [error] [src/flb_http_client.c:1077 errno=32] Broken pipe 2020-07-23 09:51:07[2020/07/23 06:51:07] [error] [output:datadog:datadog.1] could not flush records to http-intake.logs.datadoghq.com:80 (http_do=-1) 2020-07-23 09:51:07[engine] caught signal (SIGSEGV) 2020-07-23 09:50:37[2020/07/23 06:50:37] [ info] [output:datadog:datadog.1] http://http-intake.logs.datadoghq.com, port=80, HTTP status=200 payload={} 2020-07-23 09:50:02[2020/07/23 06:50:02] [ info] [output:datadog:datadog.1] http://http-intake.logs.datadoghq.com, port=80, HTTP status=200 payload={} 2020-07-23 09:46:37[2020/07/23 06:46:37] [ info] [output:datadog:datadog.1] http://http-intake.logs.datadoghq.com, port=80, HTTP status=200 payload={} 2020-07-23 09:46:32[2020/07/23 06:46:32] [ info] [output:datadog:datadog.1] http://http-intake.logs.datadoghq.com, port=80, HTTP status=200 payload={} 2020-07-23 09:46:27[2020/07/23 06:46:27] [ info] [output:datadog:datadog.1] http://http-intake.logs.datadoghq.com, port=80, HTTP status=200 payload={} 2020-07-23 09:46:22[2020/07/23 06:46:22] [ info] [output:datadog:datadog.1] http://http-intake.logs.datadoghq.com, port=80, HTTP status=200 payload={} 2020-07-23 09:46:14[2020/07/23 06:46:14] [ info] [output:datadog:datadog.1] http://http-intake.logs.datadoghq.com, port=80, HTTP status=200 payload={} 2020-07-23 09:46:14[2020/07/23 06:46:14] [ info] [engine] flush chunk ‘1-1595412363.546270069.flb’ succeeded at retry 1: task_id=1, input=forward.0 > output=datadog.1 2020-07-23 09:46:12[2020/07/23 06:46:12] [ info] [output:datadog:datadog.1] http://http-intake.logs.datadoghq.com, port=80, HTTP status=200 payload={} 2020-07-23 09:46:07[2020/07/23 06:46:07] [error] [src/flb_http_client.c:1077 errno=32] Broken pipe 2020-07-23 09:46:07[2020/07/23 06:46:07] [error] [output:datadog:datadog.1] could not flush records to http-intake.logs.datadoghq.com:80 (http_do=-1) 2020-07-23 09:46:07[2020/07/23 06:46:07] [ warn] [engine] failed to flush chunk ‘1-1595412363.546270069.flb’, retry in 7 seconds: task_id=0, input=forward.0 > output=datadog.1 2020-07-23 09:45:37[2020/07/23 06:45:37] [ info] [output:datadog:datadog.1] http://http-intake.logs.datadoghq.com, port=80, HTTP status=200 payload={} 2020-07-23 09:45:02[2020/07/23 06:45:02] [ info] [output:datadog:datadog.1] http://http-intake.logs.datadoghq.com, port=80, HTTP status=200 payload={} 2020-07-23 09:41:37[2020/07/23 06:41:37] [ info] [output:datadog:datadog.1] http://http-intake.logs.datadoghq.com, port=80, HTTP status=200 payload={}

Your Environment

  • Version used: latest, 2.5.0
  • Configuration: see above
  • Environment name and version AWS Fargate

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 15 (10 by maintainers)

Most upvoted comments

@dotanrs Ok, luckily DataDog gave me a test account for this integration… I will try my best to see if I can reproduce and diagnose this issue.

We think we have made some progress in debugging this issue. Please see the suggestion here: https://github.com/aws/aws-for-fluent-bit/issues/66#issuecomment-684371904