gpdb: Some problems about holdTillEndXact

Greenplum version or build

tmp=# select version(); PostgreSQL 12beta2 (Greenplum Database 7.0.0-alpha.0+dev.14581.g07fe0dc763 build dev) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 6.5.1 20190307 (Alibaba 6.5.1-1 2.17), 64-bit compiled on May 6 2021 17:32:14 (with assert checking)

Expected behavior

  1. holdTillEndXact of LOCKTAG_RELATION_EXTEND should be false.
  2. holdTillEndXact of LOCKTAG_TRANSACTION should be true.

Actual behavior

  1. holdTillEndXact of LOCKTAG_RELATION_EXTEND is true.

    (gdb) finish
    Run till exit from #0  GetLockStatusData () at lock.c:3906
    0x00000000017acfe6 in pg_lock_status (fcinfo=0x62900007a268) at lockfuncs.c:166
    166			mystatus->lockData = GetLockStatusData();
    Value returned is $8 = (LockData *) 0x61900002e680
    (gdb) p ((LockData *) 0x61900002e680)->locks[4]
    $14 = {locktag = {locktag_field1 = 16384, locktag_field2 = 16385, locktag_field3 = 0, locktag_field4 = 0, locktag_type = 1 '\001', locktag_lockmethodid = 1 '\001'}, holdMask = 128, waitLockMode = 0, backend = 13, lxid = 3, pid = 113272, leaderPid = 113272, fastpath = false, databaseId = 16384, mppSessionId = 46, mppIsWriter = true, distribXid = 8206, holdTillEndXact = true}
    
  2. holdTillEndXact of LOCKTAG_TRANSACTION is false.

    (gdb) bt
    #0  hash_search_with_hash_value (hashp=0x62500000d7b0, keyPtr=0x7f0dd85016b0, hashvalue=2617248362, action=HASH_REMOVE, foundPtr=0x0) at dynahash.c:925
    #1  0x00000000015b2519 in CleanUpLock (lock=0x7f0dd85016b0, proclock=0x7f0dd940c608, lockMethodTable=0x2648ea0 <default_lockmethod>, hashcode=2617248362, wakeupNeeded=false) at lock.c:1796
    #2  0x00000000015b9fcb in LockRefindAndRelease (lockMethodTable=0x2648ea0 <default_lockmethod>, proc=0x7f0dda8f11f8, locktag=0x6250000a68a0, lockmode=7, decrement_strong_lock_count=true) at lock.c:3385
    #3  0x00000000015c1611 in lock_twophase_postcommit (xid=562, info=0, recdata=0x6250000a68a0, len=20) at lock.c:4708
    #4  0x0000000000a28b00 in ProcessRecords (bufptr=0x6250000a68a0 "2\002", xid=562, callbacks=0x234e6e0 <twophase_postcommit_callbacks>) at twophase.c:1757
    #5  0x0000000000a28607 in FinishPreparedTransaction (gid=0x625000007a78 "8205", isCommit=true, raiseErrorIfNotFound=true) at twophase.c:1704
    #6  0x0000000001c39427 in performDtxProtocolCommitPrepared (gid=0x625000007a78 "8205", raiseErrorIfNotFound=true) at cdbtm.c:2106
    #7  0x0000000001c39f4d in performDtxProtocolCommand (dtxProtocolCommand=DTX_PROTOCOL_COMMAND_COMMIT_PREPARED, gid=0x625000007a78 "8205", contextInfo=0x3ad3900 <TempDtxContextInfo>) at cdbtm.c:2267
    #8  0x0000000001601b69 in exec_mpp_dtx_protocol_command (dtxProtocolCommand=DTX_PROTOCOL_COMMAND_COMMIT_PREPARED, loggingStr=0x625000007a58 "Distributed Commit Prepared", gid=0x625000007a78 "8205", contextInfo=0x3ad3900 <TempDtxContextInfo>) at postgres.c:1557
    #9  0x000000000160dbb8 in PostgresMain (argc=1, argv=0x6290000494b8, dbname=0x6290000483a0 "tmp", username=0x629000048380 "zhanyi") at postgres.c:5467
    #10 0x000000000143b298 in BackendRun (port=0x61400000e040) at postmaster.c:4922
    #11 0x000000000143a08d in BackendStartup (port=0x61400000e040) at postmaster.c:4607
    #12 0x0000000001432944 in ServerLoop () at postmaster.c:1963
    #13 0x00000000014316cc in PostmasterMain (argc=7, argv=0x60600000ec60) at postmaster.c:1589
    #14 0x00000000010babd1 in main (argc=7, argv=0x60600000ec60) at main.c:240
    (gdb) f 1
    #1  0x00000000015b2519 in CleanUpLock (lock=0x7f0dd85016b0, proclock=0x7f0dd940c608, lockMethodTable=0x2648ea0 <default_lockmethod>, hashcode=2617248362, wakeupNeeded=false) at lock.c:1796
    1796			if (!hash_search_with_hash_value(LockMethodLockHash,
    (gdb) p *lock
    $6 = {tag = {locktag_field1 = 562, locktag_field2 = 0, locktag_field3 = 0, locktag_field4 = 0, locktag_type = 4 '\004', locktag_lockmethodid = 1 '\001'}, grantMask = 0, waitMask = 0, procLocks = {prev = 0x7f0dd85016c8, next = 0x7f0dd85016c8}, waitProcs = {links = {prev = 0x7f0dd85016d8, next = 0x7f0dd85016d8}, size = 0}, requested = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, nRequested = 0, granted = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, nGranted = 0, holdTillEndXact = false}    
    

It looks like that holdTillEndXact is not explicitly set, it depends on the previous value in HASHHDR::freeList.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (21 by maintainers)

Commits related to this issue

Most upvoted comments

I will send out PR soon.

I confirmed the conclusion that holdTillEndXact for LOCKTAG_RELATION_EXTEND is at random actually.

It’s the calling stack related to LOCKTAG_RELATION_EXTEND:

hash_search_with_hash_value
--SetupLockInTable
----LockAcquireExtended
------LockAcquire
--------LockRelationForExtension
----------fsm_extend

In a short, when the table is to extended, fsm_extend() will try to acquire the lock of LOCKTAG_RELATION_EXTEND. Then SetupLockInTable() is called to find or create the lock with specific tag:

    /*   
     * Find or create a lock with this tag.
     */
    lock = (LOCK *) hash_search_with_hash_value(LockMethodLockHash,
                                                (const void *) locktag,
                                                hashcode,
                                                HASH_ENTER_NULL,
                                                &found);
    if (!lock)
        return NULL;

For the case, since the lock doesn’t exist, a new lock is returned and initialized:

    /*   
     * if it's a new lock object, initialize it
     */
    if (!found)
    {    
        lock->grantMask = 0; 
        lock->waitMask = 0; 
        SHMQueueInit(&(lock->procLocks));
        ProcQueueInit(&(lock->waitProcs));
        lock->nRequested = 0; 
        lock->nGranted = 0; 
        MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
        MemSet(lock->granted, 0, sizeof(int) * MAX_LOCKMODES);
        LOCK_PRINT("LockAcquire: new", lock, lockmode);
    }    

Note in the block, the flag holdTillEndXact is never initialized, so it should keep the same to the value when it’s returned by hash_search_with_hash_value(), which does nothing to set the flag either. So yes, the answer is that holdTillEndXact is at random at the time.

On another hand, when the operation of extending table is done, it’s the call stack to release the lock:

LockRelease
--UnlockRelationForExtension
----RelationGetBufferForTuple
------heap_insert
--------heapam_tuple_insert
----------table_tuple_insert
------------ExecInsert

No LockSetHoldTillEndXact() is called all the time. In addition, we scanned the code and did not find any possibility to set the flag holdTillEndXact before and after extending the table.

A reasonable fix is to add one line to the initialization block above to set holdTillEndXact to true. I will send out PR request later.