hudi: [HUDI-1615] [SUPPORT] ERROR HoodieTimelineArchiveLog: Failed to archive commits

Hello,

Hudi version: 0.7 Emr version: 6.2 Spark version: 3.0.1

Hudi Options:

Map(hoodie.datasource.hive_sync.database -> raw_courier_api_hudi, 
hoodie.parquet.small.file.limit -> 67108864, 
hoodie.copyonwrite.record.size.estimate -> 1024, 
hoodie.datasource.write.precombine.field -> LineCreatedTimestamp, 
hoodie.datasource.hive_sync.partition_extractor_class -> org.apache.hudi.hive.NonPartitionedExtractor, hoodie.parquet.max.file.size -> 134217728, 
hoodie.parquet.block.size -> 67108864, 
hoodie.datasource.hive_sync.table -> customer_address, 
hoodie.datasource.write.operation -> upsert, 
hoodie.datasource.hive_sync.enable -> true, 
hoodie.datasource.write.recordkey.field -> id, 
hoodie.table.name -> customer_address, 
hoodie.datasource.hive_sync.jdbcurl -> jdbc:hive2://emr:10000, 
hoodie.datasource.write.hive_style_partitioning -> false, 
hoodie.datasource.write.table.name -> customer_address, 
hoodie.datasource.write.keygenerator.class -> org.apache.hudi.keygen.NonpartitionedKeyGenerator, hoodie.upsert.shuffle.parallelism -> 50)
21/02/01 08:12:22 ERROR HoodieTimelineArchiveLog: Failed to archive commits, .commit file: 20210201021259.commit.requested
java.lang.NullPointerException: null of string of map of union in field extraMetadata of org.apache.hudi.avro.model.HoodieCommitMetadata of union in field hoodieCommitMetadata of org.apache.hudi.avro.model.HoodieArchivedMetaEntry
	at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:145)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:139)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62)
	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.serializeRecords(HoodieAvroDataBlock.java:106)
	at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:97)
	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:164)
	at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlock(HoodieLogFormatWriter.java:142)
	at org.apache.hudi.table.HoodieTimelineArchiveLog.writeToFile(HoodieTimelineArchiveLog.java:361)
	at org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:311)
	at org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:138)
	at org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:426)
	at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:188)
	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:110)
	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:442)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:124)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:123)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
	at hudiwriter.HudiWriter.merge(HudiWriter.scala:72)
	at hudiwriter.HudiContext.writeToHudi(HudiContext.scala:35)
	at jobs.TableProcessor.start(TableProcessor.scala:84)
	at TableProcessorWrapper$.$anonfun$main$2(TableProcessorWrapper.scala:23)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
	at scala.util.Success.$anonfun$map$1(Try.scala:255)
	at scala.util.Success.map(Try.scala:213)
	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.lang.NullPointerException
	at org.apache.avro.io.Encoder.writeString(Encoder.java:121)
	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:267)
	at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:262)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:128)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
	at org.apache.avro.generic.GenericDatumWriter.writeMap(GenericDatumWriter.java:234)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:121)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:125)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75)
	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
	... 59 more

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 54 (37 by maintainers)

Most upvoted comments

Folks, got side tracked by other work last week. Back on hudi, this week. We will get moving on this.

For now, if you try the one line fix in CommitUtils, we will be out of the woods. I have raised a sev:critical JIRA here https://issues.apache.org/jira/browse/HUDI-1615 for the fix

public static HoodieCommitMetadata buildMetadata(List<HoodieWriteStat> writeStats,
                                                   Map<String, List<String>> partitionToReplaceFileIds,
                                                   Option<Map<String, String>> extraMetadata,
                                                   WriteOperationType operationType,
                                                   String schemaToStoreInCommit,
                                                   String commitActionType) {

    HoodieCommitMetadata commitMetadata = buildMetadataFromStats(writeStats, partitionToReplaceFileIds, commitActionType, operationType);

    // add in extra metadata
    if (extraMetadata.isPresent()) {
      extraMetadata.get().forEach(commitMetadata::addMetadata);
    }

    // this is the bandaid
    commitMetadata.addMetadata(HoodieCommitMetadata.SCHEMA_KEY, schemaToStoreInCommit == null ? "" : schemaToStoreInCommit);
    commitMetadata.setOperationType(operationType);
    return commitMetadata;
  }

@nsivabalan - what commit fixes the issue and what version of hudi contains the fix? Is the fix included in 0.8.0 ?