xgboost: [jvm-packages] spark hangs when training is run in quick succession
I am getting an infinite hang when I run the following code a few times in quick succession:
import ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier
val dataPath = "__SPARK_HOME_LOCATION__/data/mllib/sample_binary_classification_data.txt"
val data = spark.read.format("libsvm").option("vectorType", "dense").load(dataPath)
val xgbClassifier = new XGBoostClassifier()
xgbClassifier.fit(data).transform(data).show()
Steps to reproduce:
- open spark-shell with xgboost jars
- Run the above code
- quickly rerun the last line until a hang happens
Other information: When the hang happens I only get the tracker message, and nothing after that, I have to kill the spark job. (If I wait between runs, they always succeed. )
Tracker started, with env={DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=XXX.XXX.XXX.XXX, DMLC_TRACKER_PORT=9096, DMLC_NUM_WORKER=2}
My environment:
- XGBoost Master
- Spark 2.4.3
- (Happens in both: Zeppelin and Spark-Shell)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 16 (15 by maintainers)
For folks has similar issue. A quick fix can be achieved by running this before your training job.
Sure. Let me take it.