quartz: Triggers are getting blocked permanently

Dear Quartz Team,

We are using Quartz 2.2.1 in clustered-mode with JDBC job store to schedule jobs marked as @DisallowConcurrentExecution.

We have observed that occasionally triggers are getting stuck in trigger state BLOCKED without ever recovering automatically. Looking into the job store DB tables, the pattern is always the same:

  • The TRIGGER_STATE on <PREFIX>_TRIGGERS is in state BLOCKED

  • There is no corresponding record in <PREFIX>_FIRED_TRIGGERS

Obviously org.quartz.impl.jdbcjobstore.JobStoreSupport.clusterRecover(Connection, List<SchedulerStateRecord>) will not recover such triggers, so the only way to get out of this inconsistent state is to manually set the TRIGGER_STATE back to WAITING.

It is not yet clear under which circumstances this error occurs. However, our log files indicate that jobs getting stuck coincides with temporary database problems.

Below you can find an example of a NullPointerException in org.quartz.impl.jdbcjobstore.JobStoreSupport.triggersFired(List<OperableTrigger>). The exception itself was caused somewhere in the JDBC driver (Sybase jConnect) when trying to invoke rollback() on a JDBC connection. The log entry’s timestamp correlates exactly with the time the trigger got stuck.

2017 05 01 20:20:02#+00#ERROR#org.quartz.core.QuartzSchedulerThread##anonymous#ItOpScheduler_Clustered_QuartzSchedulerThread#Runtime error occurred in main trigger firing loop.java.lang.NullPointerException: while trying to invoke the method com.sybase.jdbc4.tds.TdsCursor.setRowNum(int) of a null object loaded from field com.sybase.jdbc4.tds.CurInfo3Token._cursor of an object loaded from local variable 'this'
	at com.sybase.jdbc4.tds.CurInfo3Token.getMetaInformation(CurInfo3Token.java:85)
	at com.sybase.jdbc4.tds.CurInfoToken.<init>(CurInfoToken.java:130)
	at com.sybase.jdbc4.tds.CurInfo3Token.<init>(CurInfo3Token.java:45)
	at com.sybase.jdbc4.tds.Tds.nextResult(Tds.java:3239)
	at com.sybase.jdbc4.tds.Tds.readCommandResults(Tds.java:4459)
	at com.sybase.jdbc4.tds.Tds.doCommand(Tds.java:4444)
	at com.sybase.jdbc4.tds.Tds.endTransaction(Tds.java:2602)
	at com.sybase.jdbc4.jdbc.SybConnection.rollback(SybConnection.java:1953)
	at sun.reflect.GeneratedMethodAccessor492.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.sap.core.persistence.jdbc.trace.TraceableBase$1.invoke(TraceableBase.java:44)
	at com.sun.proxy.$Proxy17.rollback(Unknown Source)
	at com.sap.core.persistence.jdbc.trace.TraceableConnection.rollback(TraceableConnection.java:239)
	at org.apache.commons.dbcp.DelegatingConnection.rollback(DelegatingConnection.java:368)
	at org.apache.commons.dbcp.DelegatingConnection.rollback(DelegatingConnection.java:368)
	at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.rollback(PoolingDataSource.java:323)
	at sun.reflect.GeneratedMethodAccessor492.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.quartz.impl.jdbcjobstore.AttributeRestoringConnectionInvocationHandler.invoke(AttributeRestoringConnectionInvocationHandler.java:73)
	at com.sun.proxy.$Proxy143.rollback(Unknown Source)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.rollbackConnection(JobStoreSupport.java:3658)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3817)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.triggersFired(JobStoreSupport.java:2908)
	at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:336)
|

Please let me know if you need additional details.

Thanks for your support, Sebastian

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 22 (2 by maintainers)

Commits related to this issue

Most upvoted comments

I was able to reproduce the issue in the debugger.

If org.quartz.impl.jdbcjobstore.JobStoreSupport.triggerFired(Connection, OperableTrigger) throws a RuntimeException after the trigger state has been set to BLOCKED, the trigger will get stuck. The reason for this is that QuartzSchedulerThead.run() will call org.quartz.spi.JobStore.releaseAcquiredTrigger(OperableTrigger) in case of RuntimeExceptions, which will delete the record from <PREFIX>_FIRED_TRIGGERS but will not set back the trigger state from BLOCKED to WAITING.

Probably org.quartz.impl.jdbcjobstore.JobStoreSupport.releaseAcquiredTrigger(Connection, OperableTrigger) should set back the trigger state to WAITING from both ACQUIRED (which it already does) and from BLOCKED.

Best Regards, Sebastian

Our team has deployed latest version 2.3.2 quartz JAR last Friday in prod server. But we are still immediately experiencing BLOCKED triggers in the qrtz_triggers table for our email notification after catalina restart. Is there any followup/advice for this behavior? Would enabling TRACE logging on “org.quartz” package help? We have a clustered option set in the quartz.properties and the quartz tables are in the same db schema as our app tables.

Has anyone enabled JMX remote access to mbeans as a potential workaround to reset trigger state to WAITING and then immediately firing trigger? https://dzone.com/articles/how-manage-quartz-remotely

For those who are still seeing this issue and if you implemented the JobListener interface, make sure you handle the exception yourself within jobWasExecuted method as quartz does not handle exception thrown in that method and that could leave your job state in BLOCKED and never get recovered. We experienced it with Quartz version 2.3.0

Most of these blocked triggers change the trigger state from BLOCKED to WAITING automatically since added the JobListner and upgrade quartz from 2.3.0 to the latest version 2.3.2. But still exists one or two BLOCKED triggers in my case.

For those who are still seeing this issue and if you implemented the JobListener interface, make sure you handle the exception yourself within jobWasExecuted method as quartz does not handle exception thrown in that method and that could leave your job state in BLOCKED and never get recovered. We experienced it with Quartz version 2.3.0

After 6 hours trying to find a solution I bumped into your answer and it was exactly what was happening in my code. Thank you

For those who are still seeing this issue and if you implemented the JobListener interface, make sure you handle the exception yourself within jobWasExecuted method as quartz does not handle exception thrown in that method and that could leave your job state in BLOCKED and never get recovered. We experienced it with Quartz version 2.3.0