elephas: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.

Hi, I’m trying to use elephas for my deep learning models on spark but so far I couldn’t even get anything to work on 3 different machines and on multiple notebooks.

  • “ml_pipeline_otto.py” crashes on the load_data_frame function, more specifically on return sqlContext.createDataFrame(data, ['features', 'category']) with the error : Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.

  • “mnist_mlp_spark.py” crashes on the spark_model.fit method with the error : TypeError: can't pickle _thread.RLock objects

  • “My Own Pipeline” crashes right after fitting (it actually trains it) the model with this error : Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.

I’m running tensorflow 2.1.0, pyspark 3.0.2, jdk-8u281 and python 3.7 and elephas 1.4.2

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16

Most upvoted comments

Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session

Hi,

Thanks had same issue its been resolved. import findspark findspark.init() Initialize it before the creation of spark session

Note: Windows seems has other dependencies, Not sure what was the issue but its fixed now. please pass it on detail like how this package help to resolve this.

Hi,

Thanks had same issue its been resolved. import findspark findspark.init() Initialize it before the creation of spark session

Note: Windows seems has other dependencies, Not sure what was the issue but its fixed now. please pass it on detail like how this package help to resolve this.

Hi Mayank,

Thanks for your comments. ‘findspark’ package helped me to solve the issue.

Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session

This solved my issue. Don’t forget to restart kernel and re-run cells after installing findspark

Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session

This solved my issue. Don’t forget to restart kernel and re-run cells after installing findspark

Yes, it works for me. Especially, don’t forget to restart kernel before findspark.init()