hudi: Demo : Unexpected result in some queries

I have two problems with the master branch(commit: ae3c02fb3) my steps:

  1. use HDFSParquetImporter to import from hive to hudi

  2. use HoodieDeltaStreamer to import new data from kafka.(I add a option to allow missing checkpointStr) the config is same as #779, with --disable-compaction. And then select distinct _hoodie_commit_time from rt_table/ro_table only return the first the commit time (use max() to ensure no newer commits return); But there are newer .deltacommit file in the .hoodie folder.

  3. restart the spark job. open the spark UI, will find that the job hangs at collect at HoodieMergeOnReadTable.java:318 (It hangs every time)

org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
com.uber.hoodie.table.HoodieMergeOnReadTable.rollback(HoodieMergeOnReadTable.java:318)
com.uber.hoodie.HoodieWriteClient.doRollbackAndGetStats(HoodieWriteClient.java:884)
com.uber.hoodie.HoodieWriteClient.rollbackInternal(HoodieWriteClient.java:962)
com.uber.hoodie.HoodieWriteClient.rollback(HoodieWriteClient.java:773)
com.uber.hoodie.HoodieWriteClient.rollbackInflightCommits(HoodieWriteClient.java:1182)
com.uber.hoodie.HoodieWriteClient.startCommitWithTime(HoodieWriteClient.java:1050)
com.uber.hoodie.HoodieWriteClient.startCommit(HoodieWriteClient.java:1043)
com.uber.hoodie.utilities.deltastreamer.DeltaSync.startCommit(DeltaSync.java:406)
com.uber.hoodie.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:332)
com.uber.hoodie.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:227)
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:382)
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 32 (31 by maintainers)

Most upvoted comments

Weird that it is intermittent. @bhasudha lets meet and take a stab at this sometime… this also blocks #751 and related efforts , which blocks spark upgrade which blocks timestamp support 😃