keycloak: Keycloak fails to start due to infinispan state transfer exception

Before reporting an issue

  • I have searched existing issues
  • I have reproduced the issue with the latest release

Area

infinispan

Describe the bug

Running into following exception when starting new keycloak container on k8s.

2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.counter.impl.CounterModuleLifecycle@7e76b944.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.tasks.impl.LifecycleCallbacks@60d67fbd.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.jboss.marshalling.JbossMarshallingModule@2a2df0de.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.server.hotrod.LifecycleCallbacks@5fc79f76.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.multimap.impl.MultimapModuleLifecycle@60ae908e.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.persistence.remote.LifecycleCallbacks@adf8238.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.factories.ComponentRegistry] (keycloak-cache-init) Invoking org.infinispan.server.core.LifecycleCallbacks@7c48f27b.cacheStopped()
2023-06-19 21:42:16,446 TRACE [org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl] (keycloak-cache-init) Invoking listener: org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler@3bbe1c15 passing event EventImpl{type=CACHE_STOPPED, newMembers=null, oldMembers=null, localAddress=null, viewId=0, subgroupsMerged=null, mergeVie
w=false}
2023-06-19 21:42:16,447 TRACE [org.infinispan.factories.impl.BasicComponentRegistryImpl] (keycloak-cache-init) Changed status of org.infinispan.globalstate.GlobalConfigurationManager to FAILED
2023-06-19 21:42:16,447 ERROR [org.infinispan.CONFIG] (keycloak-cache-init) ISPN000660: DefaultCacheManager start failed, stopping any running components: org.infinispan.commons.CacheException: java.lang.InterruptedException
        at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:243)
        at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1013)
        at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:504)
        at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:723)
        at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:669)
        at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:558)
        at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:521)
        at org.infinispan.security.actions.GetCacheAction.run(GetCacheAction.java:26)
        at org.infinispan.security.actions.GetCacheAction.run(GetCacheAction.java:14)
        at org.infinispan.security.Security.doPrivileged(Security.java:56)
        at org.infinispan.globalstate.impl.SecurityActions.doPrivileged(SecurityActions.java:30)
        at org.infinispan.globalstate.impl.SecurityActions.getCache(SecurityActions.java:39)
        at org.infinispan.globalstate.impl.GlobalConfigurationManagerImpl.start(GlobalConfigurationManagerImpl.java:111)
        at org.infinispan.globalstate.impl.CorePackageImpl$2.start(CorePackageImpl.java:60)
        at org.infinispan.globalstate.impl.CorePackageImpl$2.start(CorePackageImpl.java:48)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStart(BasicComponentRegistryImpl.java:617)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:608)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:577)
        at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:808)
        at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:357)
        at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:250)
        at org.infinispan.manager.DefaultCacheManager.internalStart(DefaultCacheManager.java:775)
        at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:743)
        at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:407)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.startCacheManager(CacheManagerFactory.java:96)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.InterruptedException
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:385)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)
        at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:236)
        ... 28 more

Subsequent startup failure.

2023-06-19 21:42:16,491 DEBUG [org.infinispan.quarkus.hibernate.cache.QuarkusInfinispanRegionFactory] (main) Stop region factory
2023-06-19 21:42:16,492 DEBUG [org.infinispan.quarkus.hibernate.cache.QuarkusInfinispanRegionFactory] (main) Clear region references
2023-06-19 21:42:16,533 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Failed to start server in (production) mode
2023-06-19 21:42:16,533 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) Error details:: java.lang.RuntimeException: Failed to start caches
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.getOrCreate(CacheManagerFactory.java:61)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory_3e2e78b5a5eee8303325d41faca0a80d7da888f7_Synthetic_ClientProxy.getOrCreate(Unknown Source)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.QuarkusCacheManagerProvider.getCacheManager(QuarkusCacheManagerProvider.java:32)
        at org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory.lazyInit(DefaultInfinispanConnectionProviderFactory.java:143)
        at org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory.create(DefaultInfinispanConnectionProviderFactory.java:83)
        at org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory.create(DefaultInfinispanConnectionProviderFactory.java:67)
        at org.keycloak.services.DefaultKeycloakSession.getProvider(DefaultKeycloakSession.java:271)
        at org.keycloak.models.sessions.infinispan.InfinispanSingleUseObjectProviderFactory.getSingleUseObjectCache(InfinispanSingleUseObjectProviderFactory.java:53)
        at org.keycloak.models.sessions.infinispan.InfinispanSingleUseObjectProviderFactory.postInit(InfinispanSingleUseObjectProviderFactory.java:77)
        at org.keycloak.quarkus.runtime.integration.QuarkusKeycloakSessionFactory.init(QuarkusKeycloakSessionFactory.java:105)
        at org.keycloak.quarkus.runtime.integration.jaxrs.QuarkusKeycloakApplication.createSessionFactory(QuarkusKeycloakApplication.java:41)
        at org.keycloak.services.resources.KeycloakApplication.startup(KeycloakApplication.java:125)
        at org.keycloak.quarkus.runtime.integration.QuarkusLifecycleObserver.onStartupEvent(QuarkusLifecycleObserver.java:37)
        at org.keycloak.quarkus.runtime.integration.QuarkusLifecycleObserver_Observer_onStartupEvent_b0e82415b143738dc1f986a5fa4668e83d0a5dea.notify(Unknown Source)
        at io.quarkus.arc.impl.EventImpl$Notifier.notifyObservers(EventImpl.java:326)
        at io.quarkus.arc.impl.EventImpl$Notifier.notify(EventImpl.java:308)
        at io.quarkus.arc.impl.EventImpl.fire(EventImpl.java:76)
        at io.quarkus.arc.runtime.ArcRecorder.fireLifecycleEvent(ArcRecorder.java:131)
        at io.quarkus.arc.runtime.ArcRecorder.handleLifecycleEvents(ArcRecorder.java:100)
        at io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294.deploy_0(Unknown Source)
        at io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294.deploy(Unknown Source)
        at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
        at io.quarkus.runtime.Application.start(Application.java:101)
        at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:110)
        at io.quarkus.runtime.Quarkus.run(Quarkus.java:70)
        at org.keycloak.quarkus.runtime.KeycloakMain.start(KeycloakMain.java:98)
        at org.keycloak.quarkus.runtime.cli.command.AbstractStartCommand.run(AbstractStartCommand.java:37)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at org.keycloak.quarkus.runtime.cli.Picocli.parseAndRun(Picocli.java:94)
        at org.keycloak.quarkus.runtime.KeycloakMain.main(KeycloakMain.java:88)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:61)
        at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:32)
Caused by: java.util.concurrent.TimeoutException
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:204)
        at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.getOrCreate(CacheManagerFactory.java:59)
        ... 42 more

Configuration:

We use the default cache-ispn.yaml and the configuration works fine on 20.0.5

Version

21.1.1

Expected behavior

Keycloak should start without any infinispan error.

Actual behavior

Keycloak container fails to start due to infinispan error.

How to Reproduce?

Able to reproduce on version 21.1.1 with following config

KC_CACHE_STACK - kubernetes KC_HEALTH_ENABLED - true KC_METRICS_ENABLED - true KC_PROXY - reencrypt

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 6
  • Comments: 26 (17 by maintainers)

Commits related to this issue

Most upvoted comments

@ahus1 Thanks for improving and verifying the workaround.

To summarize what’s the future of this issue:

We’ve been discussing it with the maintainers of JGroups and Infinispan, and we will be waiting for JGroups 5.2.17 and Infinispan 14.0.x releases. These releases will contain a fix for this issue. It might be included in the next planned micro-release 22.0.2 of Keycloak, or potentially in the next one.

JGroups identifies nodes using UUID and it has a cache that maps UUID to logical names (keycloak-*) and physical addresses (IP and port). This cache is populated on demand and in your case, it seems the UUID is not present in the cache. The message should go away after a while after the message is retried and the cache is populated with that destination.

If it does not go away, it may be some misconfiguration or some network issue.

@souravs17031999 This is is not supported as the Infinispan versions in different Keycloak versions are not compatible. Furthermore, the new Keycloak version might contain database migrations. To do a Keycloak version upgrade, you have to scale to zero pods first and then update.

The first one should work but you need to configure the cache-stack.

     # Workaround for https://github.com/keycloak/keycloak/issues/21092
    - name: 'cache-stack'
      value: kubernetes-with-fdsock

Example: https://github.com/keycloak/keycloak-benchmark/commit/880e4ea05a2262f0dc70861f6c608f17028325ad

@sschu would you mind pointing to any documentation that talk about the upgrade process and the need of scaling down to 0 (especially when using infinispan) ? I tried to find informations on this but was not able to find any. Thanks a lot for the help 😃

@ahus1, For me I am using StatefulSets and the scenario is like we have deployed Keycloak in k8s and two pods are running on v21, now we start to upgrade to v22, so first pod which comes up with v22 (keeping one of the older one at v21) starts to throw NPE exceptions and goes to crashLoop continuosly failing the deployment.

@pruivo from my limited knowledge the changes proposed here #21064 can solve this issue. Is that correct?

It can workaround the “problem” (although I don’t believe the NPE causes any issue and the cluster is able to recover).

The user can replace the protocol FD_SOCK2 with FD_SOCK (or remove it). With the change in #21064 the following custom stack can be used:

<jgroups>
   <stack name="my-stack" extends="kubernetes">
      <FD_SOCK stack.combine="REPLACE" stack.position="FD_SOCK2"/>
   </stack>
</jgroups>

And set KC_CACHE_STACK="my-stack" (assuming it is running in OpenShift)