fluent-bit: Stackdriver stops working after one hour: Oauth2

Bug Report

Describe the bug We use gke@1.16.15-gke.7800 and fluentbit v1.7.2. The configured google service account has the following roles:

  • Service Account Token Creator
  • Logs Bucket Writer
  • Logs Configuration Writer
  • Logs Writer
  • Monitoring Metric Writer

The configuration works fine and logs are forwarded to stackdriver. But after exactly one hour of log forwarding, the stackdriver plugin fails to push new logs:

To Reproduce

[2021/03/22 14:33:08] [debug] [output:stackdriver:stackdriver.0] JWT signature:
ey........
[2021/03/22 14:33:08] [debug] [http_client] not using http_proxy for header
[2021/03/22 14:33:08] [debug] [http_client] header=POST /oauth2/v4/token HTTP/1.1
Host: www.googleapis.com
Content-Length: 169553
Content-Type: application/x-www-form-urlencoded


[2021/03/22 14:33:08] [ info] [oauth2] HTTP Status=400
[2021/03/22 14:33:08] [ info] [oauth2] payload:
{
  "error": "unsupported_grant_type",
  "error_description": "Invalid grant_type: "
}
[2021/03/22 14:33:08] [error] [output:stackdriver:stackdriver.0] error retrieving oauth2 access token
[2021/03/22 14:33:08] [error] [output:stackdriver:stackdriver.0] cannot retrieve oauth2 token
  • Steps to reproduce the problem:
  1. service account created with mentioned roles
  2. fluent-bit version 1.7.2
  3. Cluster had been created with managed fluentbit logging, however
  4. scaled down google managed fluentbit. To not interfere with our fluentbit testing.
  5. Create daemonset and check Stackdriver logs
  6. after 1 hour we get mentioned OAuth2 errors.
  7. restart of the pods helps and logs are getting forwarded for another hour until the error returns.

Expected behavior Fluentbit should refresh oauth token correctly and not fail after one hour.

Your Environment

  • Version used: 1.7.2
  • Configuration: see below
  • Environment name and version (e.g. Kubernetes? What version?): Kubernetes 1.16.15-gke.7800
  • Operating System and version: https://fluent.github.io/helm-charts 0.12.3 with image fluent/fluent-bit 1.7.2
  • Filters and plugins: see config
config:
      service: |
        [SERVICE]
            Flush         5
            Grace         120
            Log_Level     trace
            Daemon        off
            Parsers_File  custom_parsers.conf
            HTTP_Server   On
            HTTP_Listen   0.0.0.0
            HTTP_PORT     2020
    
      inputs: |
        [INPUT]
            Name             tail
            Alias            kube_containers_kube-system
            Tag              kube.<namespace_name>.<pod_name>.<container_name>
            Tag_Regex        (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
            Path             /var/log/containers/*_kube-system_*.log
            DB               /var/run/google-fluentbit/pos-files/flb_kube_kube-system.db
            Buffer_Max_Size  1MB
            Mem_Buf_Limit    5MB
            Skip_Long_Lines  On
            Refresh_Interval 5
            Read_from_Head   True


        [INPUT]
            Name             tail
            Alias            kube_containers_gke-system
            Tag              kube.<namespace_name>.<pod_name>.<container_name>
            Tag_Regex        (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
            Path             /var/log/containers/*_gke-system_*.log
            DB               /var/run/google-fluentbit/pos-files/flb_kube_gke-system.db
            Buffer_Max_Size  1MB
            Mem_Buf_Limit    5MB
            Skip_Long_Lines  On
            Refresh_Interval 5
            Read_from_Head   True

        [INPUT]
            Name             tail
            Alias            kube_containers
            Tag              kube.<namespace_name>.<pod_name>.<container_name>
            Tag_Regex        (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
            Path             /var/log/containers/*.log
            Exclude_Path     /var/log/containers/*_kube-system_*.log,/var/log/containers/*_istio-system_*.log,/var/log/containers/*_knative-serving_*.log,/var/log/containers/*_gke-system_*.log,/var/log/containers/*_config-management-system_*.log
            DB               /var/run/google-fluentbit/pos-files/flb_kube.db
            Buffer_Max_Size  1MB
            Mem_Buf_Limit    5MB
            Skip_Long_Lines  On
            Refresh_Interval 5
            Read_from_Head   True

        # Example:
        # Dec 21 23:17:22 gke-foo-1-1-4b5cbd14-node-4eoj startupscript: Finished running startup script /var/run/google.startup.script
        [INPUT]
            Name             tail
            Parser           syslog
            Path             /var/log/startupscript.log
            DB               /var/run/google-fluentbit/pos-files/startupscript.db
            Alias            startupscript
            Tag              startupscript
            Read_from_Head   True

        # Logs from anetd for policy action
        [INPUT]
            Name             tail
            Parser           network-log
            Alias            policy-action
            Tag              policy-action
            Path             /var/log/network/policy_action.log
            DB               /var/run/google-fluentbit/pos-files/policy-action.db
            Skip_Long_Lines  On
            Refresh_Interval 5
            Read_from_Head   True

        # Example:
        # I1118 21:26:53.975789       6 proxier.go:1096] Port "nodePort for kube-system/default-http-backend:http" (:31429/tcp) was open before and is still needed
        [INPUT]
            Name            tail
            Alias           kube-proxy
            Tag             kube-proxy
            Path            /var/log/kube-proxy.log
            DB              /var/run/google-fluentbit/pos-files/kube-proxy.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB
            Parser          glog
            Read_from_Head  True

        # Logs from systemd-journal for interesting services.
        [INPUT]
            Name            systemd
            Alias           docker
            Tag             docker
            Systemd_Filter  _SYSTEMD_UNIT=docker.service
            Path            /var/log/journal
            DB              /var/run/google-fluentbit/pos-files/docker.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB

        [INPUT]
            Name            systemd
            Alias           container-runtime
            Tag             container-runtime
            Systemd_Filter  _SYSTEMD_UNIT=containerd.service
            Path            /var/log/journal
            DB              /var/run/google-fluentbit/pos-files/container-runtime.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB

        [INPUT]
            Name            systemd
            Alias           kubelet
            Tag             kubelet
            Systemd_Filter  _SYSTEMD_UNIT=kubelet.service
            Path            /var/log/journal
            DB              /var/run/google-fluentbit/pos-files/kubelet.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB

        # kube-node-installation, kube-node-configuration, and kube-logrotate are
        # oneshots, but it's extremely valuable to have their logs on Stackdriver
        # as they can diagnose critical issues with node startup.
        [INPUT]
            Name            systemd
            Alias           kube-node-installation
            Tag             kube-node-installation
            Systemd_Filter  _SYSTEMD_UNIT=kube-node-installation.service
            Path            /var/log/journal
            DB              /var/run/google-fluentbit/pos-files/kube-node-installation.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB

        [INPUT]
            Name            systemd
            Alias           kube-node-configuration
            Tag             kube-node-configuration
            Systemd_Filter  _SYSTEMD_UNIT=kube-node-configuration.service
            Path            /var/log/journal
            DB              /var/run/google-fluentbit/pos-files/kube-node-configuration.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB

        [INPUT]
            Name            systemd
            Alias           kube-logrotate
            Tag             kube-logrotate
            Systemd_Filter  _SYSTEMD_UNIT=kube-logrotate.service
            Path            /var/log/journal
            DB              /var/run/google-fluentbit/pos-files/kube-logrotate.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB

        [INPUT]
            Name            systemd
            Alias           node-problem-detector
            Tag             node-problem-detector
            Systemd_Filter  _SYSTEMD_UNIT=node-problem-detector.service
            Path            /var/log/journal
            DB              /var/run/google-fluentbit/pos-files/node-problem-detector.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB

        [INPUT]
            Name            systemd
            Alias           kube-container-runtime-monitor
            Tag             kube-container-runtime-monitor
            Systemd_Filter  _SYSTEMD_UNIT=kube-container-runtime-monitor.service
            Path            /var/log/journal
            DB              /var/run/google-fluentbit/pos-files/kube-container-runtime-monitor.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB

        [INPUT]
            Name            systemd
            Alias           kubelet-monitor
            Tag             kubelet-monitor
            Systemd_Filter  _SYSTEMD_UNIT=kubelet-monitor.service
            Path            /var/log/journal
            DB              /var/run/google-fluentbit/pos-files/kubelet-monitor.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB

        [INPUT]
            Name            systemd
            Alias           gcfsd
            Tag             gcfsd
            Systemd_Filter  _SYSTEMD_UNIT=gcfsd.service
            Path            /var/log/journal
            DB              /var/run/google-fluentbit/pos-files/gcfsd.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB

        [INPUT]
            Name            systemd
            Alias           gcfs-snapshotter
            Tag             gcfs-snapshotter
            Systemd_Filter  _SYSTEMD_UNIT=gcfs-snapshotter.service
            Path            /var/log/journal
            DB              /var/run/google-fluentbit/pos-files/gcfs-snapshotter.db
            Buffer_Max_Size 1MB
            Mem_Buf_Limit   1MB
    
      filters: |
        [FILTER]
            Name         parser
            Match        kube.*
            Key_Name     log
            Reserve_Data True
            Parser       docker
            Parser       containerd

        [FILTER]
            Name        modify
            Match       *
            Hard_rename log message

        [FILTER]
            Name         parser
            Match        kube.*
            Key_Name     message
            Reserve_Data True
            Parser       glog
            Parser       json
            Parser       logfmt

        [FILTER]
            Name    modify
            Match   *
            Copy    level severity

        [FILTER]
            Name                kubernetes
            Match               kube.*
            Kube_Tag_Prefix     kube.
            Regex_Parser        pod-tag-parser
            Kube_URL            https://kubernetes.default.svc:443
            Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
            Merge_Log           On
            K8S-Logging.Parser  On
            K8S-Logging.Exclude On

      outputs: |
        [OUTPUT]
            Name                  stackdriver
            Match                 kube.*
            Resource              k8s_container
            k8s_cluster_name      sre-playground-cluster
            k8s_cluster_location  europe-west4
            tag_prefix            kube.
            severity_key          severity

        [OUTPUT]
            Name                  stackdriver
            Match_Regex           ^(?!kube).*
            Resource              global
            k8s_cluster_name      sre-playground-cluster
            k8s_cluster_location  europe-west4
            
      customParsers: |
        [PARSER]
            Name        docker
            Format      json
            Time_Key    time
            Time_Format %Y-%m-%dT%H:%M:%S.%L%z

        [PARSER]
            Name        containerd
            Format      regex
            Regex       ^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$
            Time_Key    time
            Time_Format %Y-%m-%dT%H:%M:%S.%L%z

        [PARSER]
            Name        json
            Format      json

        [PARSER]
            Name        logfmt
            Format      logfmt        

        [PARSER]
            Name        syslog
            Format      regex
            Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
            Time_Key    time
            Time_Format %b %d %H:%M:%S

        [PARSER]
            Name        glog
            Format      regex
            Regex       ^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source_file>[^ \]]+)\:(?<source_line>\d+)\]\s(?<message>.*)$
            Time_Key    time
            Time_Format %m%d %H:%M:%S.%L

        [PARSER]
            Name        network-log
            Format      json
            Time_Key    timestamp
            Time_Format %Y-%m-%dT%H:%M:%S.%L%z

        [PARSER]
            Name    pod-tag-parser
            Format  regex
            Regex   (?<namespace_name>[^\.]+)\.(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)\.(?<container_name>.+)(?<docker_id>)
    

Of course, service account is mounted to the pods:

env:
    - name: GOOGLE_SERVICE_CREDENTIALS
      value: "/secret/fluentbit/stackdriver/service-account.json"

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 19 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for the investigation! I think the issue is the following: https://github.com/fluent/fluent-bit/blob/491889b5601f7b353df31a46e0b867ee5464b376/src/flb_oauth2.c#L245 Since payload is an sds_t string it has a header which didn’t get updated containing the string length, meaning after the clearing the first byte strings are appended to the end of the buffer rather than the start. It looks like we can fix this by calling flb_sds_len_set(ctx->payload, 0). I added the draft pr https://github.com/fluent/fluent-bit/pull/3291 to attempt to fix this, but have not tested it yet