pandarallel: parallel_apply results in EOFError when run from Pycharm, works fine from Jupyter Notebook
I was trying to parallelise my code with pandarallel
package in the following way:
import pandas as pd
from sklearn.cluster import SpectralClustering
from pandarallel import pandarallel
import numpy as np
ex = {'measurement_id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1}, 'time': {0: 30000, 1: 30000, 2: 30000, 3: 30000, 4: 30000, 5: 30000, 6: 30000, 7: 30000, 8: 30000, 9: 30000, 10: 30100, 11: 30100, 12: 30100, 13: 30100, 14: 30100, 15: 30100, 16: 30100, 17: 30100, 18: 30100, 19: 30100}, 'group': {0: '0', 1: '0', 2: '0', 3: '0', 4: '0', 5: '0', 6: '0', 7: '0', 8: '0', 9: '0', 10: '0', 11: '0', 12: '0', 13: '0', 14: '0', 15: '0', 16: '0', 17: '0', 18: '0', 19: '0'}, 'object': {0: 'obj1', 1: 'obj10', 2: 'obj2', 3: 'obj3', 4: 'obj4', 5: 'obj5', 6: 'obj6', 7: 'obj7', 8: 'obj8', 9: 'obj9', 10: 'obj1', 11: 'obj10', 12: 'obj2', 13: 'obj3', 14: 'obj4', 15: 'obj5', 16: 'obj6', 17: 'obj7', 18: 'obj8', 19: 'obj9'}, 'x': {0: 55.507999420166016, 1: 49.67399978637695, 2: 61.9640007019043, 3: 67.98300170898438, 4: 49.43199920654297, 5: 40.34000015258789, 6: 69.50399780273438, 7: 49.65800094604492, 8: 68.48200225830078, 9: 37.87900161743164, 10: 55.595001220703125, 11: 49.52399826049805, 12: 61.92499923706055, 13: 67.91799926757812, 14: 49.30099868774414, 15: 40.141998291015625, 16: 69.49299621582031, 17: 49.775001525878906, 18: 68.4010009765625, 19: 37.77899932861328}}
ex = pd.DataFrame.from_dict(ex).set_index(['measurement_id', 'time', 'group'])
def cluster(x, index):
x = np.asarray(x)[:, np.newaxis]
clustering = SpectralClustering(n_clusters = 3, random_state = 42, gamma = 1 / 50).fit(x)
return pd.Series(clustering.labels_ + 1, index = index)
pandarallel.initialize(nb_workers=2, progress_bar=True)
ex \
.groupby(['measurement_id', 'time', 'group']) \
.parallel_apply(lambda x: cluster(x['x'], x['object']))
However, when I’m running this on Pycharm I get the following error:
File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-84-7c89aedcfad4>", line 13, in <module> .parallel_apply(lambda x: cluster(x['x'], x['object'])) File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 451, in closure map_result, File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 358, in get_workers_result message_type, message = queue.get() File "<string>", line 2, in get File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/managers.py", line 819, in _callmethod kind, result = conn.recv() File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError
I thought that this is maybe due to some incompatibility with the latest pandas or python release and tried to recreate the issue with different environment on Jupyter Notebook. It worked well so I tested the same environment on Jupyter notebook - it worked fine. I made sure that I’m running the same environment with
import sys
print(sys.executable)
and this is indeed a case. So the only difference seems to that I use PyCharm instead of Jupyter Notebook. My environment is set up with Python 3.7.6 and pandas 1.0.1.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 12
- Comments: 18
Deactivating “Run with Python Console” in the run configuration solved the problem for me.
Same for me in PyCharm. Havn’t tried a different IDE yet. However, NOT using the memory file system by setting
use_memory_fs=False
in theinitialize
call seems to work.If such problem comes, first step to do is to run
apply
function and see if the code is working. If your code fails due to any reason,pandarallel
gives EOFError.