strimzi-kafka-operator: Topic Operator failing to start with io.vertx.core.VertxException: Thread blocked

Describe the bug When deploying a very simple cluster with the topicOperator enabled, the topicOperator container fails to start. The logs for the container report a blocked thread. The k8s liveness check eventually kills the container.

2021-12-16 00:16:50,79115 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 2542 ms, time limit is 2000 ms
2021-12-16 00:16:51,79090 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 3542 ms, time limit is 2000 ms
2021-12-16 00:16:52,79034 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 4541 ms, time limit is 2000 ms
2021-12-16 00:16:53,79105 WARN  [vertx-blocked-thread-checker] BlockedThreadChecker: - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 5542 ms, time limit is 2000 ms
io.vertx.core.VertxException: Thread blocked
	at jdk.internal.misc.Unsafe.park(Native Method) ~[?:?]
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) ~[?:?]
	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1796) ~[?:?]
	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3128) ~[?:?]
	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1823) ~[?:?]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1998) ~[?:?]
	at io.apicurio.registry.utils.ConcurrentUtil.get(ConcurrentUtil.java:35) ~[io.apicurio.apicurio-registry-common-1.3.2.Final.jar:?]
	at io.apicurio.registry.utils.ConcurrentUtil.get(ConcurrentUtil.java:27) ~[io.apicurio.apicurio-registry-common-1.3.2.Final.jar:?]
	at io.apicurio.registry.utils.ConcurrentUtil.result(ConcurrentUtil.java:54) ~[io.apicurio.apicurio-registry-common-1.3.2.Final.jar:?]
	at io.strimzi.operator.topic.Session.lambda$start$9(Session.java:198) ~[io.strimzi.topic-operator-0.26.0.jar:0.26.0]
	at io.strimzi.operator.topic.Session$$Lambda$278/0x0000000840319840.handle(Unknown Source) ~[?:?]
	at io.vertx.core.impl.future.FutureImpl$3.onSuccess(FutureImpl.java:141) ~[io.vertx.vertx-core-4.1.5.jar:4.1.5]
	at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:54) ~[io.vertx.vertx-core-4.1.5.jar:4.1.5]
	at io.vertx.core.impl.future.FutureBase$$Lambda$293/0x000000084031e040.run(Unknown Source) ~[?:?]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[io.netty.netty-transport-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.68.Final.jar:4.1.68.Final]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]

To Reproduce Steps to reproduce the behavior:

Install Strimzi Operator using the 0.26.0 helm chart
Create a Cluster manifest:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka-basic
spec:
  kafka:
    version: 3.0.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
    storage:
      type: ephemeral
  zookeeper:   
    replicas: 1
    storage:
      type: ephemeral
  entityOperator:
    topicOperator: {}
    userOperator: {}

Apply the manifest with kubectl apply -f kafka-basic.yaml
Watch the topic operator logs with kubectl logs deploy/kafka-basic-entity-operator -c topic-operator

Expected behavior The topic operator starts correctly.

Environment:

Strimzi version: 0.26.0
Installation method: Helm chart
Kubernetes cluster: Kubernetes 1.20.7
Infrastructure: Amazon EKS

YAML files and logs Thanks for the handy script! report-16-12-2021_11-26-59.zip

Additional context Similar errors show up in these issues: https://github.com/strimzi/strimzi-kafka-operator/issues/383 https://github.com/strimzi/strimzi-kafka-operator/issues/1050 https://github.com/strimzi/strimzi-kafka-operator/issues/4964

Increasing the resource claims for the topic operator didn’t change the behaviour.

Zookeeper doesn’t show any errors or timeouts.

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 25 (9 by maintainers)

Commits related to this issue

Fix for #6046 - Topic Operator failing to start with io.vertx.core.VertxException: Thread blocked (#6645) * Refactor liveness probe server and topic operator start-up sequence so that a slow Kafka St... — committed to strimzi/strimzi-kafka-operator by LiamClarkeNZ 2 years ago

Most upvoted comments

Also running into this.

For the time being, I am defaulting back to zookeeper store instead of kafka streams store by doing the following

  entityOperator:
    template:
      topicOperatorContainer:
        env:
        - name: STRIMZI_USE_ZOOKEEPER_TOPIC_STORE
          value: "true"

+12

danlenar on Jan 11, 2022

FWIW this still seems to be an issue in my case, and I’ve been grateful for the hack above. Currently deploying 0.36.0 using the quickstart.

cslovell on Jul 25, 2023

@LiamClarkeNZ I did not keep my logs unfortunately, but looks like you reproduced it.

Separately, did anyone revert the STRIMZI_USE_ZOOKEEPER_TOPIC_STORE=true setting successfully?

urton on Apr 7, 2022

Using ZK for now is fine, but as you note ZK will eventually disappear. So I guess overriding is fine in the short term.

tombentley on Feb 11, 2022

Also running into this.

For the time being, I am defaulting back to zookeeper store instead of kafka streams store by doing the following
  entityOperator:
    template:
      topicOperatorContainer:
        env:
        - name: STRIMZI_USE_ZOOKEEPER_TOPIC_STORE
          value: "true"

@danlenar’s solution worked for me when I was migrating an existing cluster to a new namespace and ran into an issue where the strimzi-store-topic would not come ready due to InvalidStateStoreException. Posting here in case anyone else embarks on the unenviable task of moving a cluster to a new namespace…

urton on Apr 5, 2022

@Cave-Johnson in the Kafka custom resource spec.

Example:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  entityOperator:
    template:
      topicOperatorContainer:
        env:
        - name: STRIMZI_USE_ZOOKEEPER_TOPIC_STORE
          value: true
  # ...

fvaleri on Apr 4, 2022

I’m also having this issue and with template it’s working fine.

wolfedale on Feb 10, 2022

I am also having this issue.

The temporary fix from @danlenar is what has helped me at the moment.

Thomas-Rice on Feb 8, 2022