quartz: In clustered mode，if do not use acquireTriggersWithinLock properties will appear ABA problem.

In clustered mode，if do not use acquireTriggersWithinLock properties will appear ABA problem.

here is the quartz source code:

public List<OperableTrigger> acquireNextTriggers(final long noLaterThan, final int maxCount, final long timeWindow)
        throws JobPersistenceException {
if(isAcquireTriggersWithinLock() || maxCount > 1) { 
            lockName = LOCK_TRIGGER_ACCESS;
        } else {
            lockName = null;
        }
...

protected List<OperableTrigger> acquireNextTrigger(Connection conn, long noLaterThan, int maxCount, long timeWindow)
        throws JobPersistenceException {
 
        ... 

        do {
            currentLoopCount ++;
            try {
                List<TriggerKey> keys = getDelegate().selectTriggerToAcquire(conn, noLaterThan + timeWindow, getMisfireTime(), maxCount);
                
                // No trigger is ready to fire yet.
                if (keys == null || keys.size() == 0)
                    return acquiredTriggers;

                long batchEnd = noLaterThan;

                for(TriggerKey triggerKey: keys) {
                    
                    ...

                    // We now have a acquired trigger, let's add to return list.
                    // If our trigger was no longer in the expected state, try a new one.
                    int rowsUpdated = getDelegate().updateTriggerStateFromOtherState(conn, triggerKey, STATE_ACQUIRED, STATE_WAITING);
                    if (rowsUpdated <= 0) {
                        continue; // next trigger
                    }
                    nextTrigger.setFireInstanceId(getFiredTriggerRecordId());
                    getDelegate().insertFiredTrigger(conn, nextTrigger, STATE_ACQUIRED, null);

                    if(acquiredTriggers.isEmpty()) {
                        batchEnd = Math.max(nextTrigger.getNextFireTime().getTime(), System.currentTimeMillis()) + timeWindow;
                    }
                    acquiredTriggers.add(nextTrigger);
                }

                ...

            } catch (Exception e) {
                throw new JobPersistenceException(
                          "Couldn't acquire next trigger: " + e.getMessage(), e);
            }
        } while (true);
        
        // Return the acquired trigger list
        return acquiredTriggers;
    }

In my configuration, i do not set the batchTriggerAcquisitionMaxCount property, so the default value is 1 and maxCount always equals 1. so acquireNextTriggers() method just have the TRIGGERS table row locks. Then the ABA problem will appear, cluster Server_A get trigger_a, cluster Server_B also can get trigger_a in the acquireNextTrigger() method with selectTriggerToAcquire() method, Server_A change trigger_a status WAITING->ACQUIRED->EXECUTING->WAITING, in this situation Server_B execute updateTriggerStateFromOtherState() method will success, finally the trigger_a will be fired twice.

I think this is important infomation, but clustering depoly document not mention it. Theoretically, set acquireTriggersWithinLock properties is not enough. TRIGGERS table may be need add version_number field to avoid this problem.

Here is my two server debug log, Server_A:

2017-02-17 20:58:12.974 [DEBUG] [org.quartz.core.QuartzSchedulerThread:276] batch acquisition of 0 triggers 2017-02-17 20:58:39.996 [DEBUG] [org.quartz.core.QuartzSchedulerThread:276] batch acquisition of 0 triggers 2017-02-17 20:59:07.272 [DEBUG] [org.quartz.core.QuartzSchedulerThread:276] batch acquisition of 0 triggers 2017-02-17 20:59:31.635 [DEBUG] [org.quartz.core.QuartzSchedulerThread:276] batch acquisition of 1 triggers 2017-02-17 21:00:00.002 [DEBUG] [org.quartz.impl.jdbcjobstore.StdRowLockSemaphore:107] Lock ‘TRIGGER_ACCESS’ is desired by: scheduler_QuartzSchedulerThread 2017-02-17 21:00:00.002 [DEBUG] [org.quartz.impl.jdbcjobstore.StdRowLockSemaphore:92] Lock ‘TRIGGER_ACCESS’ is being obtained: scheduler_QuartzSchedulerThread 2017-02-17 21:00:00.003 [DEBUG] [org.quartz.impl.jdbcjobstore.StdRowLockSemaphore:116] Lock ‘TRIGGER_ACCESS’ given to: scheduler_QuartzSchedulerThread 2017-02-17 21:00:00.008 [DEBUG] [org.quartz.impl.jdbcjobstore.StdRowLockSemaphore:141] Lock ‘TRIGGER_ACCESS’ returned by: scheduler_QuartzSchedulerThread 2017-02-17 21:00:00.008 [DEBUG] [org.quartz.core.JobRunShell:201] Calling execute on job DEFAULT.jobDetail 2017-02-17 21:00:00.013 [DEBUG] [org.apache.http.client.protocol.RequestAddCookies:122] CookieSpec selected: best-match 2017-02-17 21:00:00.013 [DEBUG] [org.apache.http.client.protocol.RequestAuthCache:75] Auth cache not set in the context 2017-02-17 21:00:00.013 [DEBUG] [org.apache.http.impl.conn.PoolingHttpClientConnectionManager:215] Connection request: [route: {}->http://ip:80][total kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20]

Server_B:

2017-02-17 20:58:11.974 [DEBUG] [org.quartz.core.QuartzSchedulerThread:276] batch acquisition of 0 triggers 2017-02-17 20:58:38.915 [DEBUG] [org.quartz.core.QuartzSchedulerThread:276] batch acquisition of 0 triggers 2017-02-17 20:59:08.128 [DEBUG] [org.quartz.core.QuartzSchedulerThread:276] batch acquisition of 0 triggers 2017-02-17 20:59:33.916 [DEBUG] [org.quartz.core.QuartzSchedulerThread:276] batch acquisition of 1 triggers 2017-02-17 21:00:00.001 [DEBUG] [org.quartz.impl.jdbcjobstore.StdRowLockSemaphore:107] Lock ‘TRIGGER_ACCESS’ is desired by: scheduler_QuartzSchedulerThread 2017-02-17 21:00:00.001 [DEBUG] [org.quartz.impl.jdbcjobstore.StdRowLockSemaphore:92] Lock ‘TRIGGER_ACCESS’ is being obtained: scheduler_QuartzSchedulerThread 2017-02-17 21:00:00.002 [DEBUG] [org.quartz.impl.jdbcjobstore.StdRowLockSemaphore:116] Lock ‘TRIGGER_ACCESS’ given to: scheduler_QuartzSchedulerThread 2017-02-17 21:00:00.008 [DEBUG] [org.quartz.impl.jdbcjobstore.StdRowLockSemaphore:141] Lock ‘TRIGGER_ACCESS’ returned by: scheduler_QuartzSchedulerThread 2017-02-17 21:00:00.009 [DEBUG] [org.quartz.core.JobRunShell:201] Calling execute on job DEFAULT.jobDetail 2017-02-17 21:00:00.013 [DEBUG] [org.quartz.core.QuartzSchedulerThread:276] batch acquisition of 0 triggers 2017-02-17 21:00:00.014 [DEBUG] [org.apache.http.client.protocol.RequestAddCookies:122] CookieSpec selected: best-match 2017-02-17 21:00:00.014 [DEBUG] [org.apache.http.client.protocol.RequestAuthCache:75] Auth cache not set in the context 2017-02-17 21:00:00.014 [DEBUG] [org.apache.http.impl.conn.PoolingHttpClientConnectionManager:215] Connection request: [route: {}->http://ip:80][total kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20]

My Cron Expression: 00 00 21 17 02 ? 2017

My property configuration:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.2.xsd">

    <bean id="jobDetail" class="org.springframework.scheduling.quartz.JobDetailFactoryBean">
        <property name="jobClass" value="com.netease.mail.yanxuan.scheduler.task.JobInvokeService" />
        <property name="durability" value="true"/>
    </bean>

    <bean id="scheduler" lazy-init="false" autowire="no"
          class="org.springframework.scheduling.quartz.SchedulerFactoryBean">

        <property name="jobDetails">
            <list>
                <ref bean="jobDetail"/>
            </list>
        </property>

        <property name="dataSource" ref="dataSource" />

        <property name="overwriteExistingJobs" value="true"/>


        <property name="quartzProperties">
            <props>
                <prop key="org.quartz.scheduler.instanceName">EventScheduler</prop>
                <prop key="org.quartz.scheduler.instanceId">AUTO</prop>

                <!-- Configure ThreadPool -->
                <prop key="org.quartz.threadPool.class">org.quartz.simpl.SimpleThreadPool</prop>
                <prop key="org.quartz.threadPool.threadCount">50</prop>
                <prop key="org.quartz.threadPool.threadPriority">5</prop>
                <prop key="org.quartz.threadPool.threadsInheritContextClassLoaderOfInitializingThread">true</prop>


                <!-- Configure JobStore -->
                <prop key="org.quartz.jobStore.class">org.quartz.impl.jdbcjobstore.JobStoreCMT</prop>
                <prop key="org.quartz.jobStore.misfireThreshold">60000</prop>
                <prop key="org.quartz.jobStore.driverDelegateClass">org.quartz.impl.jdbcjobstore.StdJDBCDelegate</prop>
                <prop key="org.quartz.jobStore.tablePrefix">QRTZ_</prop>
                <prop key="org.quartz.jobStore.maxMisfiresToHandleAtATime">10</prop>
                <prop key="org.quartz.jobStore.isClustered">true</prop>
                <prop key="org.quartz.jobStore.clusterCheckinInterval">20000</prop>
                <prop key="org.quartz.jobStore.dontSetAutoCommitFalse">true</prop>
                <prop key="org.quartz.jobStore.txIsolationLevelSerializable">false</prop>
                <prop key="org.quartz.jobStore.useProperties">false</prop>

            </props>
        </property>
        <property name="applicationContextSchedulerContextKey" value="applicationContext" />
    </bean>

</beans>

<dependency>
        <groupId>org.quartz-scheduler</groupId>
        <artifactId>quartz</artifactId>
        <version>2.2.1</version>
    </dependency>

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 22 (3 by maintainers)

Commits related to this issue

if acquireTriggersWithinLock not configured, double check to resolve the ABA problem of https://github.com/quartz-scheduler/quartz/issues/107 — committed to wangfan9002/quartz by wangfan9002 6 years ago

Most upvoted comments

There are two key steps here in my investigation. First to get the triggers in waiting state, modify the triggers state to be acquired, generate the firedTriggers, and the modify the triggers state to waiting or blocked. Then the firedTriggers are fired, and modify the triggers to completion status if it is the last trigger. If there is only one firedTrigger for the same trigger at the same time, the issue will never reproduce. So it seems that locking around trigger acquisition was considered unnecessary due to an optimistic locking strategy when setting trigger_state to ACQUIRED. But the ABA will occur, because there will be two or more firedTriggers for the same trigger.

Dorae132 on Jul 17, 2018

We also encountered this problem running v2.2.3. From our initial research, it seems that setting acquireTriggersWithinLock=true for clustered environments should reliably prevent the race condition. @wenniuwuren, could you explain why you think “Theoretically, set acquireTriggersWithinLock properties is not enough”?

giilby on Apr 3, 2017