strimzi-kafka-operator: Strimzi fails to create a cluster inside a service mesh (Istio)

Please use this to only for bug reports. For questions or when you need help, you can use the GitHub Discussions, our #strimzi Slack channel or out user mailing list.

Describe the bug

Attempts to create a Kafka cluster using any from examples/kafka/kafka-*.yaml fails if the target namespace has automatic sidecar injection as per ISTIO service mesh.

The cause of the failure is non-obvious as the Zookeeper infrastructure comes up just fine.

To Reproduce Steps to reproduce the behavior:

  1. Install ISTIO in a K8s cluster
  2. Enable side car injection in a target namespace (e.g. target)
  3. attempt to create a cluster. (e.g., kubectl apply -n target -f example/kafka/kafka-ephemeral.yaml)

Only the zookeeper pods are created and the cluster never becomes available.

  1. edit the namespace to disable sidecar injection
  2. the rest of the components start to deploy after the zookeeper instances are restarted without sidecar injection.

Expected behavior

It is expected that a company that implements a service mesh can operate kafka clusters within namespaces that are service mesh enabled.

Environment (please complete the following information):

  • Strimzi version: [e.g. main, 0.25.0]
  • Installation method: OperatorHub.io
  • Kubernetes cluster: [Kubernetes 1.21]
  • Infrastructure: [onprem baremetal]

YAML files and logs

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 2.8.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      log.message.format.version: "2.8"
      inter.broker.protocol.version: "2.8"
    storage:
      type: ephemeral
  zookeeper:
    replicas: 3
    storage:
      type: ephemeral
  entityOperator:
    topicOperator: {}
    userOperator: {}

Attach or copy paste the custom resources you used to deploy the Kafka cluster and the relevant YAMLs created by the Cluster Operator. Attach or copy and paste also the relevant logs.

To get a full set of logs, I installed without a sidecar, then enabled sidecar in the namespace and then manually terminated all of the pods in the namespace so they could restart within the service mesh. This allows logs to be collected from the cluster instances as well as the entity operator - these would otherwise never start.

Your “report.sh” script failed on macOS:

$ ./report.sh --namespace kafka --cluster my-cluster
./report.sh: line 7: syntax error near unexpected token `newline'
./report.sh: line 7: `<!DOCTYPE html>'

my-cluster-kafka-2.log my-cluster-entity-operator-655dd879c-vqctd.log

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 22 (5 by maintainers)

Most upvoted comments

Google GKE AND ASM mesh (which is ISTIO inside)

This issue has been discussed in the istio https://github.com/istio/istio/issues/19280

The root cause is that zookeeper listens on pod ip only

The solution includes:

Option 1: disable sidecar injection for zookeepers

...
  zookeeper:
    replicas: 3
    template:
      pod:
        metadata:
          labels:
            sidecar.istio.io/inject: "false"
...

Option 2: excluded 3888/2888 ports, e.g.

      annotations:
        traffic.sidecar.istio.io/excludeInboundPorts: "2888,3888"
        traffic.sidecar.istio.io/excludeOutboundPorts: "2888,3888"

Both Option 1/2 will exclude mTLS from zookeepers communications.

Option 3: Change zookeepers’ config set quorumListenOnAllIPs=true But telling from thread, it doesn’t seem to be working.

spec.zookeeper.config.quorumListenOnAllIPs: true

This is not a bug. We do not support any integration with Istio.

Your “report.sh” script failed on macOS:

It seems to work fine for me on my MacOS. Did you donwloaded the actual report script? The error suggests it contains some HTML code instead of bash.