presto: TPC-DS q04: VeloxRuntimeError: Bad memory usage track state

Only found when running TPC-DS SF-10K:

2023-06-14T20:46:03.577394406Z I0614 20:46:03.577050     8 Announcer.cpp:153] Announcement succeeded: 202
2023-06-14T20:46:08.552160664Z E0614 20:46:08.551824     9 Exceptions.h:68] Line: ../../velox/velox/common/memory/MemoryPool.cpp:420, Function:~MemoryPoolImpl, Expression: (usedReservationBytes_ == 0) && (reservationBytes_ == 0) && (minReservationBytes_ == 0) Bad memory usage track state: Memory Pool[op.39.0.2.TableScan LEAF MMAP track-usage thread-safe]<unlimited capacity used 256B available 1023.75KB reservation [used 256B, reserved 1.00MB, min 0B] counters [allocs 3602, frees 3600, reserves 0, releases 1, collisions 0])>, Source: RUNTIME, ErrorCode: INVALID_STATE
2023-06-14T20:46:08.552219481Z terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
2023-06-14T20:46:08.637251757Z   what():  Exception: VeloxRuntimeError
2023-06-14T20:46:08.637293482Z Error Source: RUNTIME
2023-06-14T20:46:08.637310354Z Error Code: INVALID_STATE
2023-06-14T20:46:08.637325993Z Reason: Bad memory usage track state: Memory Pool[op.39.0.2.TableScan LEAF MMAP track-usage thread-safe]<unlimited capacity used 256B available 1023.75KB reservation [used 256B, reserved 1.00MB, min 0B] counters [allocs 3602, frees 3600, reserves 0, releases 1, collisions 0])>
2023-06-14T20:46:08.637342993Z Retriable: False
2023-06-14T20:46:08.637357860Z Expression: (usedReservationBytes_ == 0) && (reservationBytes_ == 0) && (minReservationBytes_ == 0)
2023-06-14T20:46:08.637374096Z Function: ~MemoryPoolImpl
2023-06-14T20:46:08.637388901Z File: ../../velox/velox/common/memory/MemoryPool.cpp
2023-06-14T20:46:08.637404410Z Line: 420
2023-06-14T20:46:08.637419179Z Stack trace:
2023-06-14T20:46:08.637433844Z # 0  _ZN8facebook5velox7process10StackTraceC1Ei
2023-06-14T20:46:08.637448867Z # 1  _ZN8facebook5velox14VeloxExceptionC2EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
2023-06-14T20:46:08.637464126Z # 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
2023-06-14T20:46:08.637479821Z # 3  0x0000000000000000
2023-06-14T20:46:08.637494523Z # 4  _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv
2023-06-14T20:46:08.637509479Z # 5  _ZN8facebook5velox4exec4TaskD1Ev
2023-06-14T20:46:08.637524316Z # 6  _ZNSt15_Sp_counted_ptrIPN8facebook5velox4exec4TaskELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv
2023-06-14T20:46:08.637556183Z # 7  _ZNSt23_Sp_counted_ptr_inplaceIN8facebook6presto10PrestoTaskESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv
2023-06-14T20:46:08.637571225Z # 8  _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv
2023-06-14T20:46:08.637585038Z # 9  _ZN8facebook6presto11TaskManager13cleanOldTasksEv
2023-06-14T20:46:08.637598822Z # 10 _ZN5folly6detail8function14FunctionTraitsIFvvEEclEv
2023-06-14T20:46:08.637612627Z # 11 _ZN5folly17FunctionScheduler14runOneFunctionERSt11unique_lockISt5mutexENSt6chrono10time_pointINS5_3_V212steady_clockENS5_8durationIlSt5ratioILl1ELl1000000000EEEEEE
2023-06-14T20:46:08.637627262Z # 12 _ZN5folly17FunctionScheduler3runEv
2023-06-14T20:46:08.637640864Z # 13 _ZZN5folly17FunctionScheduler5startEvENKUlvE_clEv
2023-06-14T20:46:08.637655013Z # 14 _ZSt13__invoke_implIvZN5folly17FunctionScheduler5startEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
2023-06-14T20:46:08.637668964Z # 15 _ZSt8__invokeIZN5folly17FunctionScheduler5startEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS4_DpOS5_
2023-06-14T20:46:08.637684170Z # 16 _ZNSt6thread8_InvokerISt5tupleIJZN5folly17FunctionScheduler5startEvEUlvE_EEE9_M_invokeIJLm0EEEEvSt12_Index_tupleIJXspT_EEE
2023-06-14T20:46:08.637698661Z # 17 _ZNSt6thread8_InvokerISt5tupleIJZN5folly17FunctionScheduler5startEvEUlvE_EEEclEv
2023-06-14T20:46:08.637712734Z # 18 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5folly17FunctionScheduler5startEvEUlvE_EEEEE6_M_runEv
2023-06-14T20:46:08.637727006Z # 19 0x0000000000000000
2023-06-14T20:46:08.637740574Z # 20 start_thread
2023-06-14T20:46:08.637754190Z # 21 clone
2023-06-14T20:46:08.637768023Z
2023-06-14T20:46:08.637792910Z *** Aborted at 1686775568 (Unix time, try 'date -d @1686775568') ***
2023-06-14T20:46:08.637807596Z *** Signal 6 (SIGABRT) (0x7) received by PID 7 (pthread TID 0x7b76bb70e700) (linux TID 9) (maybe from PID 7, UID 0) (code: -6), stack trace: ***
2023-06-14T20:46:08.637821910Z (error retrieving stack trace)

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 20 (9 by maintainers)

Most upvoted comments

With the flag enabled, TPCDS q1 fails and the worker node crashes with the following stack trace:

| ======== Leaked memory allocation of 8388608 bytes ========
| # 0  _ZN8facebook5velox7process10StackTraceC1Ei
| # 1  _ZN8facebook5velox6memory14MemoryPoolImpl14recordAllocDbgEPKvm
| # 2  _ZN8facebook5velox6memory14MemoryPoolImpl14recordAllocDbgERKNS1_20ContiguousAllocationE
| # 3  _ZN8facebook5velox6memory14MemoryPoolImpl18allocateContiguousEmRNS1_20ContiguousAllocationEm
| # 4  _ZN8facebook5velox6memory14AllocationPool10newRunImplEm
| # 5  _ZN8facebook5velox6memory14AllocationPool13allocateFixedEmi
| # 6  _ZN8facebook5velox4dwio6common13BufferedInput8allocateERKNS0_6common6RegionE
| # 7  _ZN8facebook5velox4dwio6common13BufferedInput4loadENS2_10MetricsLog11MetricsTypeE
| # 8  _ZN8facebook5velox7parquet18StructColumnReader12loadRowGroupEjRKSt10shared_ptrINS0_4dwio6common13BufferedInputEE
| # 9  _ZN8facebook5velox7parquet10ReaderBase17scheduleRowGroupsERKSt6vectorIjSaIjEEiRNS1_18StructColumnReaderE
| # 10 _ZN8facebook5velox7parquet16ParquetRowReader21advanceToNextRowGroupEv
| # 11 _ZN8facebook5velox7parquet16ParquetRowReaderC2ERKSt10shared_ptrINS1_10ReaderBaseEERKNS0_4dwio6common16RowReaderOptionsE
| # 12 _ZSt11make_uniqueIN8facebook5velox7parquet16ParquetRowReaderEJRKSt10shared_ptrINS2_10ReaderBaseEERKNS1_4dwio6common16RowReaderOptionsEEENSt9_MakeUniqIT_E15__single_objectEDpOT0_
| # 13 _ZNK8facebook5velox7parquet13ParquetReader15createRowReaderERKNS0_4dwio6common16RowReaderOptionsE
| # 14 _ZN8facebook5velox9connector4hive11SplitReader12prepareSplitERKSt10shared_ptrINS2_15HiveTableHandleEERKNS0_4dwio6common13ReaderOptionsESt10unique_ptrINSA_13BufferedInputESt14default_deleteISF_EES4_INS0_6common14MetadataFilterEERNSA_17RuntimeStatisticsE
| # 15 _ZN8facebook5velox9connector4hive14HiveDataSource8addSplitESt10shared_ptrINS1_14ConnectorSplitEE
| # 16 _ZZN8facebook5velox4exec9TableScan7preloadESt10shared_ptrINS0_9connector14ConnectorSplitEEENKUlvE_clEv

I looked at the code and the allocation happens when a new row group is enqueued. A new BufferedInput would be created and data would be loaded inside, and the new BufferedInput would be pushed to the inputs_ map in ParquetReader::ReaderBase. The inputs_ map and corresponding BufferedInput and their buffers are freed when ParquetReader::ReaderBase runs out of life. So it seems the report in the above stack is false positive. @xiaoxmeng is https://github.com/facebookincubator/velox/pull/5639 supposed to fix this? If yes then @pramodsatya can you rerun this query with this fix and --velox_memory_pool_debug_enabled=true?

Nonetheless, there are several places we can improve “Avoid duplicate reading for small parquet files” (https://github.com/facebookincubator/velox/pull/5942) I created https://github.com/facebookincubator/velox/issues/7335 and https://github.com/facebookincubator/velox/issues/7336. We can also make ReaderBase::scheduleRowGroups() a bit simpler. I will work on these issues later this week.

| # 16 _ZZN8facebook5velox4exec9TableScan7preloadESt

This is very likely due to an IO thread that performs a split prefetch outliving the task. When the task is destroyed, it reports that the memory allocated by the split prefetch has not been released. This is a false positive (reporting bug) and there is no real memory leak.

Let’s disable prefetch (pass the flag --split_preload_per_driver=0 to Prestissimo) for now. I will investigate this further. CC: @xiaoxmeng

@pramodsatya has the image and he will try to run and reproduce this.