hudi: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"
I use Spark Sql to insert record to hudi. It work for a short time. However It throw “java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()” after a while.
Steps to reproduce the behavior:
I wrote a scala fuction to make instert sql
private def write2Table(row: Row)(implicit sparkSession: SparkSession): Unit = {
val filedSql = new StringBuilder()
val filed = row.schema.fields.map(field =>{
var value = ""
if(row.getString(row.fieldIndex(field.name)).isEmpty){
value = s"""null as ${field.name}"""
value
}else{
field.dataType match {
case StringType =>{value=s"""\'${row.getAs[String](field.name)}\' as ${field.name}"""}
case BooleanType =>{value=s"""${row.getAs[Boolean](field.name)} as ${field.name}"""}
case ByteType =>{value=s"""${row.getAs[Byte](field.name)} as ${field.name}"""}
case ShortType =>{value=s"""${row.getAs[Short](field.name)} as ${field.name}"""}
case IntegerType =>{value=s"""${row.getAs[Int](field.name)} as ${field.name}"""}
case LongType =>{value=s"""${row.getAs[Long](field.name)} as ${field.name}"""}
case FloatType =>{value=s"""${row.getAs[Float](field.name)} as ${field.name}"""}
case DoubleType =>{value=s"""${row.getAs[Double](field.name)} as ${field.name}"""}
case DateType =>{value=s"""\'${row.getAs[String](field.name)}\' as ${field.name}"""}
case TimestampType =>{value=s"""\'${row.getAs[String](field.name)}\' as ${field.name}"""}
}
value
}}).mkString(",")
val insertSql = s"""insert into ${row.getAs("database")}.${row.getAs("table")}_cow select ${filed};"""
try{
println(s""" 插入 ${row.getAs("table")}_cow;""")
sparkSession.sql(insertSql)
}catch{
case ex:Throwable=> {
println(row.prettyJson)
println(insertSql)
throw ex
}
}
}
}
Then call it in foreachRDD() of a DSteam
saveRdd.foreachRDD ( rdd => {
rdd.collect().foreach(x=>{
//println(x.json)
// println(x.schema.sql)
val row = x._1
chackAndCreateTable(row)
if(x._2.equals("INSERT")){
write2Table(row)
}
})
})
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Environment Description
Hudi version : 0.11
Spark version : 3.2.1
Hadoop version : 3.2.2
Storage (HDFS/S3/GCS…) : HDFS
Running on Docker? (yes/no) : no
Here is my config code:
.appName("SparkHudi")
.master("spark://hadoop111:7077")
.config("spark.sql.warehouse.dir","/user/hive/warehouse")
.config("spark.serialize","org.apache.spark.serializer.KryoSerializer")
.config("spark.sql.extensions","org.apache.spark.sql.hudi.HoodieSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog","org.apache.spark.sql.hudi.catalog.HoodieCatalog")
.config("spark.sql.legacy.exponentLiteralAsDecimal.enabled",true)
.enableHiveSupport()
.config("hive.metastore.uris","thrift://19.11.8.111:9083")
.getOrCreate()
spark-submit:
spark-submit --jars /home/kadm/module/hudi-0.11/packaging/hudi-spark-bundle/target/hudi-spark3.2-bundle_2.12-0.11.0.jar --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1,org.apache.spark:spark-avro_2.12:3.2.1,org.apache.kafka:kafka-clients:3.1.0 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' --conf "spark.driver.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5445" --master spark://hadoop111:7077 SparkHudi-1.0-SNAPSHOT-shaded.jar
Stacktrace
22/06/06 09:47:13 ERROR Javalin: Exception occurred while servicing http-request
java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
at org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
at org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
at org.apache.hudi.io.storage.HoodieHFileReader.close(HoodieHFileReader.java:218)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.closeReader(HoodieBackedTableMetadata.java:574)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:567)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:554)
at org.apache.hudi.metadata.HoodieMetadataFileSystemView.close(HoodieMetadataFileSystemView.java:83)
at org.apache.hudi.common.table.view.FileSystemViewManager.clearFileSystemView(FileSystemViewManager.java:86)
at org.apache.hudi.timeline.service.handlers.FileSliceHandler.refreshTable(FileSliceHandler.java:118)
at org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$19(RequestHandler.java:390)
at org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:501)
at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
at io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107)
at io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at org.apache.hudi.org.apache.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)
at org.apache.hudi.org.apache.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at org.apache.hudi.org.apache.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at org.apache.hudi.org.apache.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
at org.apache.hudi.org.apache.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
at org.apache.hudi.org.apache.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.apache.hudi.org.apache.jetty.server.Server.handle(Server.java:502)
at org.apache.hudi.org.apache.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at org.apache.hudi.org.apache.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at org.apache.hudi.org.apache.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.apache.hudi.org.apache.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.apache.hudi.org.apache.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at org.apache.hudi.org.apache.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)
at org.apache.hudi.org.apache.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)
at org.apache.hudi.org.apache.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)
at java.lang.Thread.run(Thread.java:748)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 35 (14 by maintainers)
This problem is caused by that the hbase 2.4.9 jars in maven source are compiled with hadoop-2.7. A quick fix is to compile hbase with hadoop 3.* , mvn install it, and then compile hudi.
I resolved this by my own, by packaging a new version of hbase 2.4.9 with our Hadoop 3 version with the following command:
mvn clean install -Denforcer.skip -DskipTests -Dhadoop.profile=3.0 -Psite-install-stepthen, changed
hbase.defaults.for.versioninhudi-common/src/main/resources/hbase-site.xmlafter that, changed
hbase.versionin pom.xml of Hudi, usedversions-maven-pluginto create a new Hudi version, and package Hudi again.@yihua : do you think we can document the solution proposed by @dohongdayi above in some FAQ.
We also encountered the same problem with hudi-0.11.1 & spark-3.2.1,and our current temporary method is set hoodie.metadata.enable=false.
i found that problem throws when hudi use the hadoop version which is 3.3.1 in my env.
