pandarallel: parallel_apply results in EOFError when run from Pycharm, works fine from Jupyter Notebook

I was trying to parallelise my code with pandarallel package in the following way:

import pandas as pd
from sklearn.cluster import SpectralClustering
from pandarallel import pandarallel
import numpy as np
ex = {'measurement_id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1}, 'time': {0: 30000, 1: 30000, 2: 30000, 3: 30000, 4: 30000, 5: 30000, 6: 30000, 7: 30000, 8: 30000, 9: 30000, 10: 30100, 11: 30100, 12: 30100, 13: 30100, 14: 30100, 15: 30100, 16: 30100, 17: 30100, 18: 30100, 19: 30100}, 'group': {0: '0', 1: '0', 2: '0', 3: '0', 4: '0', 5: '0', 6: '0', 7: '0', 8: '0', 9: '0', 10: '0', 11: '0', 12: '0', 13: '0', 14: '0', 15: '0', 16: '0', 17: '0', 18: '0', 19: '0'}, 'object': {0: 'obj1', 1: 'obj10', 2: 'obj2', 3: 'obj3', 4: 'obj4', 5: 'obj5', 6: 'obj6', 7: 'obj7', 8: 'obj8', 9: 'obj9', 10: 'obj1', 11: 'obj10', 12: 'obj2', 13: 'obj3', 14: 'obj4', 15: 'obj5', 16: 'obj6', 17: 'obj7', 18: 'obj8', 19: 'obj9'}, 'x': {0: 55.507999420166016, 1: 49.67399978637695, 2: 61.9640007019043, 3: 67.98300170898438, 4: 49.43199920654297, 5: 40.34000015258789, 6: 69.50399780273438, 7: 49.65800094604492, 8: 68.48200225830078, 9: 37.87900161743164, 10: 55.595001220703125, 11: 49.52399826049805, 12: 61.92499923706055, 13: 67.91799926757812, 14: 49.30099868774414, 15: 40.141998291015625, 16: 69.49299621582031, 17: 49.775001525878906, 18: 68.4010009765625, 19: 37.77899932861328}}

ex = pd.DataFrame.from_dict(ex).set_index(['measurement_id', 'time', 'group'])
    
def cluster(x, index):
    x = np.asarray(x)[:, np.newaxis]
    
    clustering = SpectralClustering(n_clusters = 3, random_state = 42, gamma = 1 / 50).fit(x)
    return pd.Series(clustering.labels_ + 1, index = index)
    
pandarallel.initialize(nb_workers=2, progress_bar=True)
ex \
    .groupby(['measurement_id', 'time', 'group']) \
    .parallel_apply(lambda x: cluster(x['x'], x['object']))

However, when I’m running this on Pycharm I get the following error:

  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-84-7c89aedcfad4>", line 13, in <module>
    .parallel_apply(lambda x: cluster(x['x'], x['object']))
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 451, in closure
    map_result,
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 358, in get_workers_result
    message_type, message = queue.get()
  File "<string>", line 2, in get
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/managers.py", line 819, in _callmethod
    kind, result = conn.recv()
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

I thought that this is maybe due to some incompatibility with the latest pandas or python release and tried to recreate the issue with different environment on Jupyter Notebook. It worked well so I tested the same environment on Jupyter notebook - it worked fine. I made sure that I’m running the same environment with

import sys
print(sys.executable)

and this is indeed a case. So the only difference seems to that I use PyCharm instead of Jupyter Notebook. My environment is set up with Python 3.7.6 and pandas 1.0.1.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 12
  • Comments: 18

Most upvoted comments

Deactivating “Run with Python Console” in the run configuration solved the problem for me.

Same for me in PyCharm. Havn’t tried a different IDE yet. However, NOT using the memory file system by setting use_memory_fs=False in the initialize call seems to work.

If such problem comes, first step to do is to run apply function and see if the code is working. If your code fails due to any reason, pandarallel gives EOFError.