hudi: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"

I use Spark Sql to insert record to hudi. It work for a short time. However It throw “java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()” after a while.

Steps to reproduce the behavior:

I wrote a scala fuction to make instert sql


 private def write2Table(row: Row)(implicit sparkSession: SparkSession): Unit = {

    val filedSql = new StringBuilder()



    val filed = row.schema.fields.map(field =>{
      var value = ""
      if(row.getString(row.fieldIndex(field.name)).isEmpty){
        value = s"""null as ${field.name}"""
        value
      }else{
      field.dataType match {
        case StringType =>{value=s"""\'${row.getAs[String](field.name)}\' as ${field.name}"""}
        case BooleanType =>{value=s"""${row.getAs[Boolean](field.name)} as ${field.name}"""}
        case ByteType =>{value=s"""${row.getAs[Byte](field.name)} as ${field.name}"""}
        case ShortType =>{value=s"""${row.getAs[Short](field.name)} as ${field.name}"""}
        case IntegerType =>{value=s"""${row.getAs[Int](field.name)} as ${field.name}"""}
        case LongType =>{value=s"""${row.getAs[Long](field.name)} as ${field.name}"""}
        case FloatType =>{value=s"""${row.getAs[Float](field.name)} as ${field.name}"""}
        case DoubleType =>{value=s"""${row.getAs[Double](field.name)} as ${field.name}"""}
        case DateType =>{value=s"""\'${row.getAs[String](field.name)}\' as ${field.name}"""}
        case TimestampType =>{value=s"""\'${row.getAs[String](field.name)}\' as ${field.name}"""}
      }
      value
    }}).mkString(",")

    val insertSql = s"""insert into ${row.getAs("database")}.${row.getAs("table")}_cow select ${filed};"""
    try{
      println(s""" 插入 ${row.getAs("table")}_cow;""")
      sparkSession.sql(insertSql)

    }catch{
      case ex:Throwable=> {
        println(row.prettyJson)
        println(insertSql)
        throw ex
      }
    }

  }
}

Then call it in foreachRDD() of a DSteam

saveRdd.foreachRDD ( rdd => {

      rdd.collect().foreach(x=>{

        //println(x.json)
//        println(x.schema.sql)

        val row = x._1
        chackAndCreateTable(row)

        if(x._2.equals("INSERT")){
          write2Table(row)
        }


      })

    })

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Environment Description

Hudi version : 0.11

Spark version : 3.2.1

Hadoop version : 3.2.2

Storage (HDFS/S3/GCS…) : HDFS

Running on Docker? (yes/no) : no

Here is my config code:

      .appName("SparkHudi")
      .master("spark://hadoop111:7077")
      .config("spark.sql.warehouse.dir","/user/hive/warehouse")
      .config("spark.serialize","org.apache.spark.serializer.KryoSerializer")
      .config("spark.sql.extensions","org.apache.spark.sql.hudi.HoodieSparkSessionExtension")
      .config("spark.sql.catalog.spark_catalog","org.apache.spark.sql.hudi.catalog.HoodieCatalog")
      .config("spark.sql.legacy.exponentLiteralAsDecimal.enabled",true)
      .enableHiveSupport()
      .config("hive.metastore.uris","thrift://19.11.8.111:9083")
      .getOrCreate()

spark-submit:

spark-submit   --jars /home/kadm/module/hudi-0.11/packaging/hudi-spark-bundle/target/hudi-spark3.2-bundle_2.12-0.11.0.jar  --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1,org.apache.spark:spark-avro_2.12:3.2.1,org.apache.kafka:kafka-clients:3.1.0  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'   --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'   --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'  --conf "spark.driver.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5445"   --master spark://hadoop111:7077 SparkHudi-1.0-SNAPSHOT-shaded.jar

Stacktrace

22/06/06 09:47:13 ERROR Javalin: Exception occurred while servicing http-request
java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics;
	at org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249)
	at org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296)
	at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825)
	at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107)
	at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421)
	at org.apache.hudi.io.storage.HoodieHFileReader.close(HoodieHFileReader.java:218)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.closeReader(HoodieBackedTableMetadata.java:574)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:567)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:554)
	at org.apache.hudi.metadata.HoodieMetadataFileSystemView.close(HoodieMetadataFileSystemView.java:83)
	at org.apache.hudi.common.table.view.FileSystemViewManager.clearFileSystemView(FileSystemViewManager.java:86)
	at org.apache.hudi.timeline.service.handlers.FileSliceHandler.refreshTable(FileSliceHandler.java:118)
	at org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$19(RequestHandler.java:390)
	at org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:501)
	at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
	at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
	at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
	at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
	at io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
	at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
	at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107)
	at io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
	at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
	at org.apache.hudi.org.apache.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)
	at org.apache.hudi.org.apache.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
	at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
	at org.apache.hudi.org.apache.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
	at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
	at org.apache.hudi.org.apache.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
	at org.apache.hudi.org.apache.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
	at org.apache.hudi.org.apache.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
	at org.apache.hudi.org.apache.jetty.server.Server.handle(Server.java:502)
	at org.apache.hudi.org.apache.jetty.server.HttpChannel.handle(HttpChannel.java:370)
	at org.apache.hudi.org.apache.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
	at org.apache.hudi.org.apache.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
	at org.apache.hudi.org.apache.jetty.io.FillInterest.fillable(FillInterest.java:103)
	at org.apache.hudi.org.apache.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
	at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
	at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
	at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
	at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
	at org.apache.hudi.org.apache.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)
	at org.apache.hudi.org.apache.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)
	at org.apache.hudi.org.apache.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)
	at java.lang.Thread.run(Thread.java:748)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 35 (14 by maintainers)

Most upvoted comments

This problem is caused by that the hbase 2.4.9 jars in maven source are compiled with hadoop-2.7. A quick fix is to compile hbase with hadoop 3.* , mvn install it, and then compile hudi.

I resolved this by my own, by packaging a new version of hbase 2.4.9 with our Hadoop 3 version with the following command:

mvn clean install -Denforcer.skip -DskipTests -Dhadoop.profile=3.0 -Psite-install-step

then, changed hbase.defaults.for.version in hudi-common/src/main/resources/hbase-site.xml

after that, changed hbase.version in pom.xml of Hudi, used versions-maven-plugin to create a new Hudi version, and package Hudi again.

@yihua : do you think we can document the solution proposed by @dohongdayi above in some FAQ.

I resolved this by my own, by packaging a new version of hbase 2.4.9 with our Hadoop 3 version with the following command:

mvn clean install -Denforcer.skip -DskipTests -Dhadoop.profile=3.0 -Psite-install-step

then, changed hbase.defaults.for.version in hudi-common/src/main/resources/hbase-site.xml

after that, changed hbase.version in pom.xml of Hudi, used versions-maven-plugin to create a new Hudi version, and package Hudi again.

We also encountered the same problem with hudi-0.11.1 & spark-3.2.1,and our current temporary method is set hoodie.metadata.enable=false.

i found that problem throws when hudi use the hadoop version which is 3.3.1 in my env. image 49DBB81F-979D-4511-A229-562AB6946B8A