logging-operator: Elasticsearch Flow failing

Describe the bug: As described in the official quickstart guide I tried to deploy the following manifests Logging

apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
  name: default-logging-sample
spec:
  enableRecreateWorkloadOnImmutableFieldChange: true
  controlNamespace: logging
  fluentd:
    metrics:
      serviceMonitor: true
    tolerations:
      - key: dedicated
        operator: Equal
        value: logging
        effect: NoSchedule
  fluentbit:
    metrics:
      serviceMonitor: true
    tolerations:
      - effect: NoSchedule
        operator: Exists
        key: node-role.kubernetes.io/master
      - effect: NoExecute
        operator: Exists
      - effect: NoSchedule
        operator: Exists

ClusterOutput

apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
  name: es-output-sample
  namespace: logging
spec:
  elasticsearch:
    host: logging-es-http.logging.svc.cluster.local
    port: 9200
    scheme: https
    ssl_verify: false
    ssl_version: TLSv1_2
    user: elastic
    password:
      valueFrom:
        secretKeyRef:
          name: logging-es-elastic-user
          key: elastic
    buffer:
      timekey: 1m
      timekey_wait: 30s
      timekey_use_utc: true

ClusterFlow

apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
  name: flow-sample
  namespace: logging
spec:
  match:
    - select:
        namespaces:
          - group-one
          - group-two
          - group-three
  filters:
    - tag_normaliser: {}
    - parser:
        key_name: message
        reserve_time: true
        reserve_data: true
        remove_key_name_field: true
        inject_key_prefix: log_
        parse:
          type: json
  outputRefs:
    - es-output-sample

But in Kibana I don’t see any logs from my custom golang applications deployed on listed namespaces (group-one, group-two, group-three).

Before that I deployed Elastic Cloud Kubernetes and it works perfectly, it is reachable and the secret containing the password is properly in place.

I tried another ClusterFlow with another ClusterOutput configuration as well and it worked partially:

apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
  name: gcs-output-sample
  namespace: logging
spec:
  gcs:
    credentials_json:
      valueFrom:
        secretKeyRef:
          name: gcs-secret-sample
          key: credentials.json
    client_retries: 10
    client_timeout: 5
    project: my-gcp-project
    bucket: logging_tests
    auto_create_bucket: true
    path: ${tag}/%Y-%m-%d_%H/
    store_as: text
    buffer:
      timekey: 1m
      timekey_wait: 10s
      timekey_use_utc: true
      chunk_limit_size: 5MB
      flush_at_shutdown: true
      retry_randomize: true
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
  name: multi-flow-sample
  namespace: logging
spec:
  match:
    - select:
        namespaces:
          - group-one
  filters:
    - tag_normaliser:
        format: ${namespace_name}_${labels.app}
    - parser:
        key_name: message
        reserve_time: true
        reserve_data: true
        remove_key_name_field: true
        parse:
          type: json
  outputRefs:
    - gcs-output-sample
    - es-output-sample

Here I see logs in Google Cloud Storage, but still nothing in Kibana.

Expected behaviour: As I tested already other different Flow and ClusterFlow configurations including following one

apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
  name: gcs-flow-sample
  namespace: logging
spec:
  match:
    - select:
        namespaces:
          - group-one
  filters:
    - tag_normaliser:
        format: ${namespace_name}_${labels.app}
    - parser:
        key_name: message
        reserve_time: true
        reserve_data: true
        remove_key_name_field: true
        parse:
          type: json
  outputRefs:
    - gcs-output-sample

and everything worked perfectly (I see logs properly stored and organised in Google Cloud Storage), I’m expecting to see logs from my custom applications in (depending on Flow configuration) both GCS and Kibana or at least in Kibana only.

Steps to reproduce the bug:

  1. deploy ElasticCloudKubernetes operator following official quickstart guide
  2. deploy ES and Kibana following official quickstart guide
  3. deploy LoggingOperator following official quickstart guide
  4. deploy my custom Logging, ClusterOutput and ClusterFlow provided in this bug

Additional context: Nothing additional

Environment details:

  • Kubernetes version: 1.15.9-gke.24
  • Cloud-provider/provisioner: GKE
  • logging-operator version: 3.0.1 with banzaicloud/logging-operator:3.1.0
  • Install method: static manifests
  • Logs from the misbehaving component: logging-operator logs
    {"level":"info","ts":1587638394.8105874,"logger":"controllers.Logging","msg":"still waiting for the configcheck result..."}
    {"level":"info","ts":1587638394.8141158,"logger":"controllers.Logging","msg":"there are pending configcheck pods, need to back off"}
    {"level":"info","ts":1587638394.8327749,"logger":"controllers.Logging","msg":"there are pending configcheck pods, need to back off"}
    {"level":"info","ts":1587638394.847659,"logger":"controllers.Logging","msg":"there are pending configcheck pods, need to back off"}
    {"level":"info","ts":1587638396.8047817,"logger":"controllers.Logging","msg":"there are running configcheck pods, need to back off"}
    {"level":"error","ts":1587638397.8755774,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"logging","request":"/default-logging-sample","error":"failed to update status: Operation cannot be fulfilled on loggings.logging.banzaicloud.io \"default-logging-sample\": the object has been modified; please apply your changes to the latest version and try again","errorVerbose":"Operation cannot be fulfilled on loggings.logging.banzaicloud.io \"default-logging-sample\": the object has been modified; please apply your changes to the latest version and try again\nfailed to update status\ngithub.com/banzaicloud/logging-operator/pkg/resources/fluentd.(*Reconciler).Reconcile\n\t/workspace/pkg/resources/fluentd/fluentd.go:119\ngithub.com/banzaicloud/logging-operator/controllers.(*LoggingReconciler).Reconcile\n\t/workspace/controllers/logging_controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:256\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.17.4/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.17.4/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.17.4/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.17.4/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.17.4/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.17.4/pkg/util/wait/wait.go:88"}
    
  • Resource definition (possibly in YAML format) that caused the issue, without sensitive data: see the above bug description

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

@oxr463 it was a mix of different things:

  • a bug in the operator that in newest versions seems to be solved
  • a misconfiguration in the Flow spec.match.select, at that time in the documentation was not really well explained, now you have to check again… anyway in my opinion matching selector works a weird not really intuitive way

After several attempts and some patience I was able to deploy:

  • es + kib using operator
  • fluent-bit + fluentd using operator
  • loki using manifests
  • routing all logs to es
  • routing some specific logs to cloud storage
  • routing some other specific logs to loki