consul-k8s: When running filebeat as a headless service, all Consul pods break leading to all systems in the service mesh failing.

Community Note

  • Please vote on this issue by adding a πŸ‘ reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave β€œ+1” or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

Filebeat runs without a service so in order to add filebeat to the service mesh, it is necessary to create a headless service. As soon as this headless service is added to the service mesh, all Consul pods stop working and consequently any service mesh services also fail leaving the entire environment broken.

The only way I have found to fix this is to completely destroy the kubernetes clusters and start again from scratch. Obviously this is highly undesirable.

Reproduction Steps

  1. Install Consul on a kubernetes cluster.
  2. Install ECK Operator using the helm chart (https://github.com/elastic/cloud-on-k8s/tree/main/deploy/eck-operator), with the following values for connect-inject:
  podAnnotations: {
    consul.hashicorp.com/connect-inject: "true",
    consul.hashicorp.com/connect-service: "elastic-operator"
  }
  1. Install Filebeat using the following filebeat.yaml file:
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: filebeat
spec:
  type: filebeat
  version: 8.2.3
  config:
    output.elasticsearch:
      hosts: https://127.0.0.1:8443
      username: "elastic"
      password: "password"

    filebeat.inputs:
      - type: container
        paths:
          - /var/log/containers/*.log
        multiline.pattern: '^{'
        multiline.negate: true
        multiline.match: after

    processors:
      - add_kubernetes_metadata:
          in_cluster: true
  daemonSet:
    podTemplate:
      metadata:
        labels:
          app: filebeat
        annotations:
          consul.hashicorp.com/connect-service: "filebeat"
          consul.hashicorp.com/kubernetes-service: "filebeat"
          consul.hashicorp.com/connect-inject: "true"
          consul.hashicorp.com/connect-service-upstreams: "elastic-search:9200"

      spec:
        automountServiceAccountToken: true
        serviceAccount: filebeat
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true
        securityContext:
          runAsUser: 0
        containers:
          - name: filebeat
            volumeMounts:
              - name: varlogcontainers
                mountPath: /var/log/containers
              - name: varlogpods
                mountPath: /var/log/pods
              - name: varlibdockercontainers
                mountPath: /var/lib/docker/containers
        volumes:
          - name: varlogcontainers
            hostPath:
              path: /var/log/containers
          - name: varlogpods
            hostPath:
              path: /var/log/pods
          - name: varlibdockercontainers
            hostPath:
              path: /var/lib/docker/containers

  1. Consul now destroys itself and the cluster.

Logs

I was unable to capture an relevant log details, however I was able to replicate the issue several times.

Expected behavior

Filebeat install as expected inside the service mesh and does not destroy the kubernetes cluster.

Environment details

Kubernetes version: v1.23.4 (although this has been replicated on earlier versions) Consul installed using helm chart 0.48.0, although again this behaviour was seen on earlier versions. Cloud provider OKE.

Additional Context

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 15 (1 by maintainers)

Most upvoted comments

@codex70 this is the part i’m talking about, in the filebeat daemonset yaml:

      spec:
        automountServiceAccountToken: true
        serviceAccount: filebeat
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true

You’re running a filebeat on every node, and it is using hostNetwork, so i think injecting the consul envoy pod (and associated CNI activity) is probably causing all traffic (maybe including that from the other envoy proxies) to pass through the filebeat envoy. You could inspect the iptables rules after deployment to confirm.