hudi: Demo : Unexpected result in some queries
I have two problems with the master branch(commit: ae3c02fb3) my steps:
-
use HDFSParquetImporter to import from hive to hudi
-
use HoodieDeltaStreamer to import new data from kafka.(I add a option to allow missing checkpointStr) the config is same as #779, with --disable-compaction. And then
select distinct _hoodie_commit_time from rt_table/ro_tableonly return the first the commit time (use max() to ensure no newer commits return); But there are newer .deltacommit file in the .hoodie folder. -
restart the spark job. open the spark UI, will find that the job hangs at
collect at HoodieMergeOnReadTable.java:318(It hangs every time)
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
com.uber.hoodie.table.HoodieMergeOnReadTable.rollback(HoodieMergeOnReadTable.java:318)
com.uber.hoodie.HoodieWriteClient.doRollbackAndGetStats(HoodieWriteClient.java:884)
com.uber.hoodie.HoodieWriteClient.rollbackInternal(HoodieWriteClient.java:962)
com.uber.hoodie.HoodieWriteClient.rollback(HoodieWriteClient.java:773)
com.uber.hoodie.HoodieWriteClient.rollbackInflightCommits(HoodieWriteClient.java:1182)
com.uber.hoodie.HoodieWriteClient.startCommitWithTime(HoodieWriteClient.java:1050)
com.uber.hoodie.HoodieWriteClient.startCommit(HoodieWriteClient.java:1043)
com.uber.hoodie.utilities.deltastreamer.DeltaSync.startCommit(DeltaSync.java:406)
com.uber.hoodie.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:332)
com.uber.hoodie.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:227)
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:382)
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 32 (31 by maintainers)
Weird that it is intermittent. @bhasudha lets meet and take a stab at this sometime… this also blocks #751 and related efforts , which blocks spark upgrade which blocks timestamp support 😃