strimzi-kafka-operator: [Bug]: Deployment of kafka resource via flux surpasses healthchecking

Bug Description

When deploying kafka via the CRD backed by the operator through Flux i observe an issue whereby the healthCheck capability of Flux which can be instructed to check for a resource being “Ready” through the use of kstatus marks the flux kustomization as applied within a few seconds.

Previously, when i deployed kafka through a CD pipeline, i had a secondary step to run kubectl wait -n kafka --timeout=15m --for=condition=Ready=True Kafka kafka. This ensures that even though my helm upgrade applied, the pipeline would wait for kafka to actually be ready. This worked fine but with flux, kstatus passes instantly.

Steps to reproduce

  1. Deploy strimzi-kafka-operator
  2. Deploy a Kafka via flux
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: kafka
  namespace: flux-system
spec:
  dependsOn:
    - name: strimzi-kafka-operator
  interval: 1h
  retryInterval: 10m
  timeout: 20m
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./infrastructure/overlays/dev/kafka
  prune: true
  wait: true
  # This healthChecks is the bit that fails to function as expected
  healthChecks:
    - apiVersion: kafka.strimzi.io/v1beta2
      kind: Kafka
      name: kafka
      namespace: kafka
  1. Observe flux deployment just moves past the kafka kustomization even though i expect it to take at least 5 minutes to spinup

Expected behavior

I expected Flux to wait for the kafka resource to be Ready but due to the way kstatus handles readiness of a resource, it passed the healthCheck instantly due to a missing observedGeneration.

An example of the CRD applying a default of -1 here: https://github.com/fluxcd/source-controller/blob/f2a1814aea9f96262e3897c71ff0d97ee29603ab/config/crd/bases/source.toolkit.fluxcd.io_buckets.yaml#L132

Strimzi version

main

Kubernetes version

1.27

Installation method

helm chart

Infrastructure

No response

Configuration files and logs

No response

Additional context

No response

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 18 (9 by maintainers)

Most upvoted comments

Ok. But about the Creating status with observedGeneration: 0? Is that fine for Flux? Or does that need to have -1?

In any case, if something changes and needs some change in Strimzi, feel free to reopen this. We want to make things work when possible. We just need to be careful to not fix it for one user and break it for another. So we are just careful about it.

I don’t know. I’m still not sure I fully understand what it means, what changes will it require, how will it work and what will be the risk of breaking something else. Its not like you can just edit the CRD YAML.

Yes you are correct, no status exists by default and this is very easy to see if you scale the operator down to 0. kstatus observes the resource as ready instantly which is less than ideal as it has a knock on effect of things assuming kafka is actually ready.

But we do not create the resource, so we cannot set its status. When you do kubectl apply you do not set it either. It is IMHO naturally empty.

Sorry, my mistake … it is kubectl edit kafka <cluster-name> --subresource=status

I think there is never any status by default.

I doublechecked it … when the Kafka cluster is deployed, we publish this status update:

status:
  conditions:
  - lastTransitionTime: "2023-10-20T15:07:29.146016099Z"
    message: Kafka cluster is being deployed
    reason: Creating
    status: "True"
    type: NotReady
  observedGeneration: 0

This would be the place the -1 might show up I think. If it helps, it might be feasible to change it and use the -1 here. But it would be great if you can doublecheck it first. Possible way how to do it would be to change things manually?

  1. Deploy Strimzi
  2. Scale-down the Stirmzi Cluster operator to 0 replicas
  3. Create the Kafka resource
  4. Set the status manually using kubectl edit/status kafka ... to
  status:
    conditions:
    - lastTransitionTime: "2023-10-20T15:07:29.146016099Z"
      message: Kafka cluster is being deployed
      reason: Creating
      status: "True"
      type: NotReady
  observedGeneration: -1
  1. See if Flux does what you expect
  2. Set the status manually using kubectl edit/status kafka ... to
  status:
    conditions:
    - lastTransitionTime: "2023-10-20T15:09:39.088480404Z"
      status: "True"
      type: Ready
    observedGeneration: 1
  1. Check that Flux does what you expect

I think you have to provide more details about what (if anything) can Strimzi do about this. Also, keep in mind that we cannot break things for other users just to make Flux users happier.