quarkus-operator-sdk: LeaderElection - error while releasing lock makes integration tests fail

I am using failsafe to run the integration test. My tests were working well before the 5.X update, but now the tests pass, but failsafe itself crashes with:

[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.271 s - in com.sicpa.ptf.extdboperator.ExternalDatabaseReconcilerIT
[INFO] Running com.sicpa.ptf.extdboperator.database.oracle.OracleDbActionIT
[WARNING] Tests run: 2, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 0 s - in com.sicpa.ptf.extdboperator.database.oracle.OracleDbActionIT
[INFO] Running com.sicpa.ptf.extdboperator.database.postgres.PostgresDbActionIT
[WARNING] Tests run: 4, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 0 s - in com.sicpa.ptf.extdboperator.database.postgres.PostgresDbActionIT
2023-02-07 08:09:24,538 INFO  [io.qua.ope.run.AppEventListener] (main) Quarkus Java Operator SDK extension is shutting down.
2023-02-07 08:09:24,538 INFO  [io.jav.ope.Operator] (main) Operator SDK 4.2.4 is shutting down...
2023-02-07 08:09:24,556 ERROR [io.fab.kub.cli.ext.lea.LeaderElector] (main) Exception occurred while releasing lock 'LeaseLock: default - external-db-operator (1e8ebe06-49eb-4c84-92ee-ee8609e942a1)' [Error Occurred After Shutdown]: io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException: Unable to update LeaseLock
	at io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LeaseLock.update(LeaseLock.java:102)
	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.release(LeaderElector.java:139)
	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.stopLeading(LeaderElector.java:120)
	at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:94)
...
2023-02-07 08:09:24,557 INFO  [io.jav.ope.LeaderElectionManager] (main) Stopped leading for identity: 1e8ebe06-49eb-4c84-92ee-ee8609e942a1. Exiting.
[DEBUG] Closing the fork 1 after not saying Good Bye.
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
...
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-failsafe-plugin:3.0.0-M8:verify (default) on project 
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
...
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:714)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:311)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:268)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1311)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1144)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:910)
[ERROR] 	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2(MojoExecutor.java:370)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute(MojoExecutor.java:351)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:215)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:171)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:163)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR] 	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR] 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:294)
[ERROR] 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
[ERROR] 	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
[ERROR] 	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:960)
[ERROR] 	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:293)
[ERROR] 	at org.apache.maven.cli.MavenCli.main(MavenCli.java:196)
[ERROR] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] 	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
[ERROR] 	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)

I guess it is somewhat related to https://github.com/quarkiverse/quarkus-operator-sdk/issues/450, but I cannot use the same solution (disable when in dev mode), since I am not running tests in dev mode. There is no way to turn off leader election either, except by removing the feature altogether.

For info, my IT tests are annotated with @QuarkusTest, I set the quarkus.operator-sdk.start-operator=false property for tests only and have the following code to manually start the operator:

@QuarkusTest
class SomeTestIT {

  @Inject
  KubernetesClient client;

  @Inject
  Operator operator;

  private static boolean isOperatorStarted = false;

  @BeforeEach
  public void startOperator() throws ClassNotFoundException {
    if (!isOperatorStarted) {
      operator.start();
      isOperatorStarted = true;
    }
    // ...
  }
}

Versions:

  • quarkus-sdk: 5.0.4
  • quarkus: 2.15.3.Final
  • failsafe / surefire: 3.0.0-M8

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (10 by maintainers)

Commits related to this issue

Most upvoted comments

failsafe launches a JVM fork to run the tests, and wait for it to exit properly. The whole process is explained here: https://maven.apache.org/surefire/maven-failsafe-plugin/examples/shutdown.html but from what I understood (and tested), if a JVM fork stops by itself (versus is stopped by failsafe) it is interpreted as a failure:

SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?

The fact that the operator stops with a System.exit(1) thus will always make failsafe fail.

Concerning:

I guess that would be a problem regardless of leader election turned on or not 😃

Not exactly: what I want to avoid is for the operator to connect to a cluster where a released version of the operator already runs. In this case, having leader election also in tests will avoid problems: the operator would hang because it cannot get the lease, the test would fail and the developer would understand his mistake. This is why I am not fond of turning the leader election off in tests. That being said, I could also check manually in a beforeAll hooks whether a lease exists or not, instead of relying on leader election. This is a possible workaround, not ideal but could work.

A property to disable leader election (that can be set to true in the test profile) would indeed solve this, and potentially other problems. I wouldn’t try to detect automatically if we are in tests though.