xgboost: Can not save Pipeline model in Spark Apache 3.2.0 with xgboost4j-spark_2.12-1.3.1 -1.4.1 - 1.5; they are not compatible with Spark 3.2.0

Dear all,

I have met these errors when training model . I searched and found that the errors already happened with some previous version xgboost4j and xgboost4j-spark. I hope to find down a solution which can run with Spark 3.2 ( it was improved BLAS performance). Thanks all guys. !!!

java.lang.NoSuchMethodError: 'org.json4s.JsonDSL$JsonAssoc org.json4s.JsonDSL$.pair2Assoc(scala.Tuple2, scala.Function1)' at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$.getMetadataToSave(DefaultXGBoostParamsWriter.scala:75) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$.saveMetadata(DefaultXGBoostParamsWriter.scala:51) at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$XGBoostRegressionModelWriter.saveImpl(XGBoostRegressor.scala:454) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$5(Pipeline.scala:257) at org.apache.spark.ml.MLEvents.withSaveInstanceEvent(events.scala:174) at org.apache.spark.ml.MLEvents.withSaveInstanceEvent$(events.scala:169) at org.apache.spark.ml.util.Instrumentation.withSaveInstanceEvent(Instrumentation.scala:42) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$4(Pipeline.scala:257) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$4$adapted(Pipeline.scala:254) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$1(Pipeline.scala:254) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$1$adapted(Pipeline.scala:247) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.Pipeline$SharedReadWrite$.saveImpl(Pipeline.scala:247) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.saveImpl(Pipeline.scala:346) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.super$save(Pipeline.scala:344) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.$anonfun$save$4(Pipeline.scala:344) at org.apache.spark.ml.MLEvents.withSaveInstanceEvent(events.scala:174) at org.apache.spark.ml.MLEvents.withSaveInstanceEvent$(events.scala:169) at org.apache.spark.ml.util.Instrumentation.withSaveInstanceEvent(Instrumentation.scala:42) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.$anonfun$save$3(Pipeline.scala:344) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.$anonfun$save$3$adapted(Pipeline.scala:344) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.save(Pipeline.scala:344) ... 75 elided

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments

Hi, the minimum required version is 3.8.

We are working to remove the Python dependency, but will take some time.

I pushed a PR for better shebang in the python data preprocessing script. https://github.com/dmlc/xgboost/pull/7389 . For JVM package those scripts are only used in testing.

That’s an unexpected result. I think it’s going to work with Python 3.8 and stop working when it’s Python 2.7. I’m using Python 3.8.

Can you visit this link https://xgboost.readthedocs.io/en/latest/install.html#id4 and see if it helps?

Could you please try nightly? Seems to be fixed by https://github.com/dmlc/xgboost/pull/7376