AxonFramework: Make the shutdown timeout configurable
Hi,
I am not entirely sure whether I would classify this as a bug or as a missing feature. Summarized: It should be possible to make the shutdown timeout in DefaultConfigurer (which is five seconds) configurable, as it is possible that some components take longer to stop.
Please consider the following demo application which uses Axon’s Spring Boot Starter.
@SpringBootApplication
public class DemoApplication {
public static void main( final String[] args ) {
final var applicationContext = new SpringApplicationBuilder( DemoApplication.class )
.properties( "axon.axonserver.enabled=false" )
.run( args );
final var eventGateway = applicationContext.getBean( EventGateway.class );
eventGateway.publish( 0 );
await( ).until( ( ) -> MyEventHandler.elementProcessed.get( ) );
applicationContext.close( );
}
@Autowired
private void configure( final EventProcessingConfigurer configurer ) {
configurer.registerTrackingEventProcessorConfiguration( MyEventHandler.PROCESSING_GROUP, c -> TrackingEventProcessorConfiguration
.forParallelProcessing( 4 )
.andEventAvailabilityTimeout( 20, TimeUnit.SECONDS ) );
}
@Component
@ProcessingGroup( MyEventHandler.PROCESSING_GROUP )
static class MyEventHandler {
public static final String PROCESSING_GROUP = "MyEventHandler";
private static AtomicBoolean elementProcessed = new AtomicBoolean( );
@EventHandler
public void on( final Integer i ) {
elementProcessed.set( true );
}
}
}
In this demo application, we have a tracking event processor with four segments and an event awailability timeout of 20 seconds. We publish a single “event” for this handler (the zero) and wait until the event has been processed (we use Awaitility to wait for this to happen). Now we close the application context and - by doing so - shut the Axon infrastructure down. What we get is the following:
2021-10-19 13:14:32.802 INFO 22484 --- [ main] o.a.e.TrackingEventProcessor : Shutdown state set for Processor 'MyEventHandler'.
2021-10-19 13:14:32.803 INFO 22484 --- [nPool-worker-19] o.a.e.TrackingEventProcessor : Processor 'MyEventHandler' awaiting termination...
2021-10-19 13:14:37.812 WARN 22484 --- [ main] o.a.config.DefaultConfigurer : Timed out during shutdown phase [1073741823] after 5 seconds. Proceeding to following phase
Notice that the DefaultConfigurer gives up after five seconds and simply continues. However, the threads of the TrackingEventProcessor are still running. This leads to a lot of exceptions after approximately 20 seconds (which is exactly the configured event awailability timeout). We start and stop application contexts very often in our integration tests. This behaviour is problematic for multiple reasons:
- Some processors are still active once the framework executes the following tests. I am not entirely sure whether this can lead to weird and flaky test behaviour or not.
- The log is full of error messages which are not related to the actual test. This makes it very difficult for us to see whether the test has actual issues.
- Stopping and starting the event processor during tests (e.g. to test the reset behaviour) takes a long time. This might rather be our own fault, because the API clearly states: “Note that some storage engines for the EmbeddedEventStore do not support streaming. They may […] wait for the timeout to occur.”
I have two ideas how this might be solved:
- As originally stated: Make the shutdown timeout configurable. We could set it to safe 25 seconds and be done with it.
- Interrupt the tracking event processor during the shutdown and force him out of the processing loop.
Current Behaviour
The DefaultConfigurer waits exactly five seconds for all components to handle the current lifecycle phase.
Wanted Behaviour
Provide a possibility to configure the timeout duration for each lifecycle phase.
Workarounds
One could use a custom SmartLifecycle which runs before Axon and stops all event handlers asynchronously with a longer timeout.
@Bean
SmartLifecycle eventProcessorStoppingSmartLifecycle( @Autowired final EventProcessingConfiguration eventProcessingConfiguration ) {
return new SmartLifecycle( ) {
private boolean running;
@Override
public void stop( ) {
try {
eventProcessingConfiguration.eventProcessors( )
.values( )
.stream( )
.map( EventProcessor::shutdownAsync )
.reduce( ( cf1, cf2 ) -> CompletableFuture.allOf( cf1, cf2 ) )
.orElse( CompletableFuture.completedFuture( null ) )
.get( 25, TimeUnit.SECONDS );
} catch ( InterruptedException | ExecutionException | TimeoutException e ) {
// Do something smart with the exception
}
running = false;
}
@Override
public void start( ) {
running = true;
}
@Override
public boolean isRunning( ) {
return running;
}
};
}
Thank you and best regards,
Nils
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (20 by maintainers)
Commits related to this issue
- Allow lifecycle phase timeout configuration Introduce a means to adjust the lifecycle phase timeout. This is useful for users that know their start-up and/or shutdown process takes longer than the d... — committed to AxonFramework/AxonFramework by smcvb 3 years ago
- Allow lifecycle phase timeout configuration Introduce a means to adjust the lifecycle phase timeout. This is useful for users that know their start-up and/or shutdown process takes longer than the d... — committed to AxonFramework/AxonFramework by smcvb 3 years ago
Just notifying you here @OLibutzki because I’ve released 4.5.13 about two hours ago. 😃
Hi @OLibutzki!
I have found sometime to move #2037 and #2041 to
axon-4.5.x
. This means they’ll be part of Axon Framework release 4.5.13. Furthermore, I hope to release 4.5.13 this week. Hence you’ll be able to use these changes (hopefully) very soon. 😃Exactly, both.
Yeah, that’s fine and your decision. I can understand, if you don’t want to put it into a patch-release.
Anyway, as it’s implemented 7 months ago and we are waiting for this to speed up our tests (and production environments) I decided to ask. Please, don’t feel pressured!
I’m happy to share some numbers.
The execution time for our test suite decreased by 36% (from 308 seconds to 198 seconds), just by using the 4.6.0-SNAPSHOT instead of 4.5.5.
We applied this to another project and had similar results: Decrease from 142 seconds to 90 seconds.
We highly appreciate your efforts and resolving this issue was a nice pre-christmas present 😃
One minor note: The tests still pass which might be relevant as well 😉
Hi @smcvb ,
Oliver and I discussed your suggestions. We would prefer the unblocking of the threads (because it would also slightly increase the performance of our tests). That said, I still think it would be a good idea to make the timeout configurable. It is still not guaranteed that all registered components can be stopped within those five seconds. So: unblocking should be the priority, configuration for the timeout would be nice to have.
Best regards
Nils
@smcvb, thank you so much!
You helped us a lot by moving this to the 4.5.x stream.
Keep up this great support / community work. It often is more important than just focussing on frequent releases and new features. Highly appreciated.
Thanks for sharing that with us, @OLibutzki! Happy to hear it has the desired effect.
I also wanted to share that I spotted some issues within our GitHub Actions due to (I assume) merge conflicts. I’ve just resolved the issues, so a recent SNAPSHOT release should be available right now.
Haha, definitely not unimportant! 😃
Thanks for the feedback there, @nils-christian. I’ll start a discussion internally to figure out the best way to unblock the processor threads. If either you or Oliver has ideas, those are, of course, appreciated (as always).
This request is related to my question in the forum: Reducing the number of token updates
I asked for negative consequences of increasing the EventAvailabilityTimeout.
As far as we understood so far there are two negative effects:
So beside a configurable shutdown timeout it would be great to get out of the EventAvailabilityTimeout waiting time earlier on shutdown as it’s a bit cumbersome to wait up to 20 seconds for the application to stop.