xgboost: [jvm-packages] XGBoostClassifier training fails with large data on a multi-node cluster
Hi, I have a pipeline of hyperparameter tuning, evaluator, and cross-validate on an XGBoostClassifier model. However, I run into the following issue and was wondering if I could get some help understanding what it means. Any suggestion or insight will be greatly appreciated. Also, I can provide more information on this, if required.
20/12/10 09:39:34 ERROR XGBoostTaskFailedListener: Training Task Failed during XGBoost Training: ExceptionFailure(ml.dmlc.xgboost4j.java.XGBoostError,[09:39:34] /workspace/jvm-packages/xgboost4j/src/native/xgboost4j.cpp:159: [09:39:34] /workspace/jvm-packages/xgboost4j/src/native/xgboost4j.cpp:78: Check failed: jenv->ExceptionOccurred():
Stack trace:
[bt] (0) /tmp/libxgboost4j169322301632920248.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x57) [0x7f6600a5e947]
[bt] (1) /tmp/libxgboost4j169322301632920248.so(XGBoost4jCallbackDataIterNext+0x2d55) [0x7f6600a5b595]
[bt] (2) /tmp/libxgboost4j169322301632920248.so(xgboost::data::SimpleDMatrix::SimpleDMatrix<xgboost::data::IteratorAdapter>(xgboost::data::IteratorAdapter*, float, int)+0x2c0) [0x7f6600b1cda0]
[bt] (3) /tmp/libxgboost4j169322301632920248.so(xgboost::DMatrix* xgboost::DMatrix::Create<xgboost::data::IteratorAdapter>(xgboost::data::IteratorAdapter*, float, int, std::string const&, unsigned long)+0x45) [0x7f6600b11d15]
[bt] (4) /tmp/libxgboost4j169322301632920248.so(XGDMatrixCreateFromDataIter+0x153) [0x7f6600a5f943]
[bt] (5) /tmp/libxgboost4j169322301632920248.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGDMatrixCreateFromDataIter+0x96) [0x7f6600a57426]
[bt] (6) [0x7f68ad018427]
Stack trace:
[bt] (0) /tmp/libxgboost4j169322301632920248.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x57) [0x7f6600a5e947]
[bt] (1) /tmp/libxgboost4j169322301632920248.so(XGBoost4jCallbackDataIterNext+0x2664) [0x7f6600a5aea4]
[bt] (2) /tmp/libxgboost4j169322301632920248.so(xgboost::data::SimpleDMatrix::SimpleDMatrix<xgboost::data::IteratorAdapter>(xgboost::data::IteratorAdapter*, float, int)+0x2c0) [0x7f6600b1cda0]
[bt] (3) /tmp/libxgboost4j169322301632920248.so(xgboost::DMatrix* xgboost::DMatrix::Create<xgboost::data::IteratorAdapter>(xgboost::data::IteratorAdapter*, float, int, std::string const&, unsigned long)+0x45) [0x7f6600b11d15]
[bt] (4) /tmp/libxgboost4j169322301632920248.so(XGDMatrixCreateFromDataIter+0x153) [0x7f6600a5f943]
[bt] (5) /tmp/libxgboost4j169322301632920248.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGDMatrixCreateFromDataIter+0x96) [0x7f6600a57426]
[bt] (6) [0x7f68ad018427]
,[Ljava.lang.StackTraceElement;@2c0925ec,ml.dmlc.xgboost4j.java.XGBoostError: [09:39:34] /workspace/jvm-packages/xgboost4j/src/native/xgboost4j.cpp:159: [09:39:34] /workspace/jvm-packages/xgboost4j/src/native/xgboost4j.cpp:78: Check failed: jenv->ExceptionOccurred():
Stack trace:
[bt] (0) /tmp/libxgboost4j169322301632920248.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x57) [0x7f6600a5e947]
[bt] (1) /tmp/libxgboost4j169322301632920248.so(XGBoost4jCallbackDataIterNext+0x2d55) [0x7f6600a5b595]
[bt] (2) /tmp/libxgboost4j169322301632920248.so(xgboost::data::SimpleDMatrix::SimpleDMatrix<xgboost::data::IteratorAdapter>(xgboost::data::IteratorAdapter*, float, int)+0x2c0) [0x7f6600b1cda0]
[bt] (3) /tmp/libxgboost4j169322301632920248.so(xgboost::DMatrix* xgboost::DMatrix::Create<xgboost::data::IteratorAdapter>(xgboost::data::IteratorAdapter*, float, int, std::string const&, unsigned long)+0x45) [0x7f6600b11d15]
[bt] (4) /tmp/libxgboost4j169322301632920248.so(XGDMatrixCreateFromDataIter+0x153) [0x7f6600a5f943]
[bt] (5) /tmp/libxgboost4j169322301632920248.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGDMatrixCreateFromDataIter+0x96) [0x7f6600a57426]
[bt] (6) [0x7f68ad018427]
Stack trace:
[bt] (0) /tmp/libxgboost4j169322301632920248.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x57) [0x7f6600a5e947]
[bt] (1) /tmp/libxgboost4j169322301632920248.so(XGBoost4jCallbackDataIterNext+0x2664) [0x7f6600a5aea4]
[bt] (2) /tmp/libxgboost4j169322301632920248.so(xgboost::data::SimpleDMatrix::SimpleDMatrix<xgboost::data::IteratorAdapter>(xgboost::data::IteratorAdapter*, float, int)+0x2c0) [0x7f6600b1cda0]
[bt] (3) /tmp/libxgboost4j169322301632920248.so(xgboost::DMatrix* xgboost::DMatrix::Create<xgboost::data::IteratorAdapter>(xgboost::data::IteratorAdapter*, float, int, std::string const&, unsigned long)+0x45) [0x7f6600b11d15]
[bt] (4) /tmp/libxgboost4j169322301632920248.so(XGDMatrixCreateFromDataIter+0x153) [0x7f6600a5f943]
[bt] (5) /tmp/libxgboost4j169322301632920248.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGDMatrixCreateFromDataIter+0x96) [0x7f6600a57426]
[bt] (6) [0x7f68ad018427]
at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
at ml.dmlc.xgboost4j.java.DMatrix.<init>(DMatrix.java:54)
at ml.dmlc.xgboost4j.scala.DMatrix.<init>(DMatrix.scala:42)
at ml.dmlc.xgboost4j.scala.spark.Watches$.buildWatches(XGBoost.scala:790)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$trainForNonRanking$1.apply(XGBoost.scala:451)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$trainForNonRanking$1.apply(XGBoost.scala:450)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 32 (2 by maintainers)
@dchristle ignore my last question. I forgot there is
.setMissing.The above code works! I can’t believe it was that simple. I will have to make sure the predictions are coming out as expected, but it does work! @monicasenapati
@jmpanfil I too use
xgboost4j-sparkand have similar data. Highly sparse and very large. Not sure if the data type could be an issue though. If you find something please do let me know since I am having a roadblock too. Thank you!@hcho3 Thank you so much for your time. I appreciate it. I was able to surpass that issue now. I discovered it was a bug in my code that was not parsing the input CSV files as I intended them to be. This current issue now appears to be fixed. I am running into another spark error. I will have to fix that.
Great! I will try to reproduce the error on my end and investigate the root cause.
ErrorSample.zip This contains a training script and sample data I am trying to train on.