elephas: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.

Hi, I’m trying to use elephas for my deep learning models on spark but so far I couldn’t even get anything to work on 3 different machines and on multiple notebooks.

“ml_pipeline_otto.py” crashes on the load_data_frame function, more specifically on return sqlContext.createDataFrame(data, ['features', 'category']) with the error : Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
“mnist_mlp_spark.py” crashes on the spark_model.fit method with the error : TypeError: can't pickle _thread.RLock objects
“My Own Pipeline” crashes right after fitting (it actually trains it) the model with this error : Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.

I’m running tensorflow 2.1.0, pyspark 3.0.2, jdk-8u281 and python 3.7 and elephas 1.4.2

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 16

Most upvoted comments

Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session

+17

nboyarkin on Jun 21, 2022

Hi,

Thanks had same issue its been resolved. import findspark findspark.init() Initialize it before the creation of spark session

Note: Windows seems has other dependencies, Not sure what was the issue but its fixed now. please pass it on detail like how this package help to resolve this.

Mayank01 on Jul 2, 2022

Hi,

Thanks had same issue its been resolved. import findspark findspark.init() Initialize it before the creation of spark session

Note: Windows seems has other dependencies, Not sure what was the issue but its fixed now. please pass it on detail like how this package help to resolve this.

Hi Mayank,

Thanks for your comments. ‘findspark’ package helped me to solve the issue.

GaneshJalakam on Aug 15, 2022

Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session

This solved my issue. Don’t forget to restart kernel and re-run cells after installing findspark

dendihandian on Oct 12, 2022

Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session

This solved my issue. Don’t forget to restart kernel and re-run cells after installing findspark

Yes, it works for me. Especially, don’t forget to restart kernel before findspark.init()

RyanXu11 on Feb 16, 2023