pandarallel: Fails with "_wrap_applied_output() missing 1 required positional argument" where a simple pandas apply succeeds

Hello,

I’m using python 3.8.10 (anaconda distribution, GCC 7.5.10) in Ubuntu LTS 20 64bits x86

From my pip freeze:

pandarallel 1.5.2 pandas 1.3.0 numpy 1.20.3

I’m working with a dataFrame that looks like this one:

	HoleID	scaffold	tpl	base	score	tMean	tErr	modelPrediction	ipdRatio	coverage	isboundary	identificationQv	context	experiment	isbegin_bondary	isend_boundary	isin_IES	uniqueID	No_known_IES_retention_this_CCS	detailed_classif
1025444	70189477	scaffold_024_with_IES	688203	T	2	0.517	0.190	0.555	0.931	11	True	NaN	TTAAATAGAAATTAAAATCAGCTGC	NM9_10	False	False	False	NM9_10_70189477	False	POTENTIALLY_RETAINED_MACIES_OUTIES
1025446	70189477	scaffold_024_with_IES	688204	A	4	1.347	0.367	1.251	1.077	13	True	NaN	TAAATAGAAATTAAAATCAGCTGCT	NM9_10	False	False	False	NM9_10_70189477	False	POTENTIALLY_RETAINED_MACIES_OUTIES
1025448	70189477	scaffold_024_with_IES	688205	A	5	1.913	0.779	1.464	1.307	16	True	NaN	AAATAGAAATTAAAATCAGCTGCTT	NM9_10	False	False	False	NM9_10_70189477	False	POTENTIALLY_RETAINED_MACIES_OUTIES
1025450	70189477	scaffold_024_with_IES	688206	A	4	1.535	0.712	1.328	1.156	18	True	NaN	AATAGAAATTAAAATCAGCTGCTTA	NM9_10	False	False	False	NM9_10_70189477	False	POTENTIALLY_RETAINED_MACIES_OUTIES
1025452	70189477	scaffold_024_with_IES	688207	A	5	1.655	0.565	1.391	1.190	18	True	NaN	ATAGAAATTAAAATCAGCTGCTTAA	NM9_10	False	False	False	NM9_10_70189477	False	POTENTIALLY_RETAINED_MACIES_OUTIES

I defined the following function

def get_distance_from_nearest_criteria(df,criteria):
    begins = df[df[criteria]].copy()
    
    if len(begins) == 0:
        return pd.Series([np.nan for x in range(len(df))])
    else:
        list_return = []

        for idx, nt in df.iterrows():
            distances = [abs(nt["tpl"] - x) for x in begins["tpl"]]
            mindistance = min(distances,default=np.nan)
            list_return.append(mindistance)

        return pd.Series(list_return)

Then using :

from pandarallel import pandarallel
pandarallel.initialize(progress_bar=False, nb_workers=12)
out = df.groupby(["uniqueID"]).parallel_apply(lambda x: get_distance_from_nearest_criteria(x,'isbegin_bondary'))

leads to :

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-02fc7c0589e3> in <module>
----> 1 out = df.groupby(["uniqueID"]).parallel_apply(lambda x: get_distance_from_nearest_criteria(x,'isbegin_bondary'))

~/conda3/envs/ies/lib/python3.8/site-packages/pandarallel/pandarallel.py in closure(data, func, *args, **kwargs)
    463             )
    464 
--> 465             return reduce(results, reduce_meta_args)
    466 
    467         finally:

~/conda3/envs/ies/lib/python3.8/site-packages/pandarallel/data_types/dataframe_groupby.py in reduce(results, df_grouped)
     14         keys, values, mutated = zip(*results)
     15         mutated = any(mutated)
---> 16         return df_grouped._wrap_applied_output(
     17             keys, values, not_indexed_same=df_grouped.mutated or mutated
     18         )

TypeError: _wrap_applied_output() missing 1 required positional argument: 'values'

For me, the error is not clear enough (I can’t tell what’s happening)

However, when I run it with a simple pandas apply :

uniqueID           
HT2_10354935    0      297.0
                1      297.0
                2      296.0
                3      296.0
                4      295.0
                       ...  
NM9_10_9568952  502      NaN
                503      NaN
                504      NaN
                505      NaN
                506      NaN
Length: 1028437, dtype: float64

I’m running all of this in a jupyter notebook

ipykernel 5.3.4 ipython 7.22.0 ipython-genutils 0.2.0 notebook 6.4.0 jupyter 1.0.0 jupyter-client 6.1.12 jupyter-console 6.4.0 jupyter-core 4.7.1 jupyter-dash 0.4.0 jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0

I was wondering if someone could explain me what’s hapenning, and how to fix it if the error is mine. Because it works out of the box with a simple pandas apply, I suppose that there is a small problem in pandarallel

NB: Note also that this code leaves unkilled processes even after I interrupted or restarted the ipython kernel EDIT: Would it be linked to the fact that I’m using a lambda function ?

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 8
Comments: 18 (5 by maintainers)

Commits related to this issue

groupby fix for pandas>=1.3.0 fixes https://github.com/nalepae/pandarallel/issues/150 `._selected_obj` is used to match https://github.com/pandas-dev/pandas/blob/3408a61ff940900ed1aa5fb89ee9263593... — committed to jorenham/pandarallel by jorenham 3 years ago

Most upvoted comments

One solution is that you could use the pandas version before v1.3.0, for example v1.2.5.

>>> import numpy as np
>>> import pandas as pd
>>> pd.__version__
'1.2.5'
>>> from pandarallel import pandarallel
>>> pandarallel.initialize()
INFO: Pandarallel will run on 12 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
>>> df = pd.DataFrame(np.random.rand(10, 2), columns=['a', 'b'])
>>> df
          a         b
0  0.632378  0.427258
1  0.814948  0.639748
2  0.701467  0.890010
3  0.803045  0.685235
4  0.749729  0.295159
5  0.588197  0.840467
6  0.707125  0.613361
7  0.027530  0.678850
8  0.468288  0.515698
9  0.824416  0.839627
>>> df.groupby('a').apply(lambda grp: grp)
          a         b
0  0.632378  0.427258
1  0.814948  0.639748
2  0.701467  0.890010
3  0.803045  0.685235
4  0.749729  0.295159
5  0.588197  0.840467
6  0.707125  0.613361
7  0.027530  0.678850
8  0.468288  0.515698
9  0.824416  0.839627
>>> df.groupby('a').parallel_apply(lambda grp: grp)
          a         b
0  0.632378  0.427258
1  0.814948  0.639748
2  0.701467  0.890010
3  0.803045  0.685235
4  0.749729  0.295159
5  0.588197  0.840467
6  0.707125  0.613361
7  0.027530  0.678850
8  0.468288  0.515698
9  0.824416  0.839627

For version after v1.3.0, the _wrap_applied_output function inside pandas/core/groupby/groupby.py add one positional argument data, therefore, causing this problem.

kr4t0n on Aug 11, 2021

I have the excatly same problem with a normal defined function. And no idea to fix it that seems like a bug in pandarallel.

winglight on Jul 25, 2021

This is the commit where it changed https://github.com/pandas-dev/pandas/commit/3408a61ff940900ed1aa5fb89ee92635938d2e94

So downgrading to <1.3 will “fix” it

jorenham on Oct 4, 2021

I am also having this issue

alexrblohm on Sep 3, 2021

@Kr4t0n OIC, I met this issue just after a upgrading but this confliction is caused from a core library’s dependency.

winglight on Aug 11, 2021

This issue still exists in 1.5.4 with pandas 1.4.1.

zxdawn on Jul 14, 2022