pandarallel: Fails with "_wrap_applied_output() missing 1 required positional argument" where a simple pandas apply succeeds
Hello,
I’m using python 3.8.10 (anaconda distribution, GCC 7.5.10) in Ubuntu LTS 20 64bits x86
From my pip freeze:
pandarallel 1.5.2 pandas 1.3.0 numpy 1.20.3
I’m working with a dataFrame that looks like this one:
HoleID | scaffold | tpl | strand | base | score | tMean | tErr | modelPrediction | ipdRatio | coverage | isboundary | identificationQv | context | experiment | isbegin_bondary | isend_boundary | isin_IES | uniqueID | No_known_IES_retention_this_CCS | detailed_classif | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1025444 | 70189477 | scaffold_024_with_IES | 688203 | 0 | T | 2 | 0.517 | 0.190 | 0.555 | 0.931 | 11 | True | NaN | TTAAATAGAAATTAAAATCAGCTGC | NM9_10 | False | False | False | NM9_10_70189477 | False | POTENTIALLY_RETAINED_MACIES_OUTIES |
1025446 | 70189477 | scaffold_024_with_IES | 688204 | 0 | A | 4 | 1.347 | 0.367 | 1.251 | 1.077 | 13 | True | NaN | TAAATAGAAATTAAAATCAGCTGCT | NM9_10 | False | False | False | NM9_10_70189477 | False | POTENTIALLY_RETAINED_MACIES_OUTIES |
1025448 | 70189477 | scaffold_024_with_IES | 688205 | 0 | A | 5 | 1.913 | 0.779 | 1.464 | 1.307 | 16 | True | NaN | AAATAGAAATTAAAATCAGCTGCTT | NM9_10 | False | False | False | NM9_10_70189477 | False | POTENTIALLY_RETAINED_MACIES_OUTIES |
1025450 | 70189477 | scaffold_024_with_IES | 688206 | 0 | A | 4 | 1.535 | 0.712 | 1.328 | 1.156 | 18 | True | NaN | AATAGAAATTAAAATCAGCTGCTTA | NM9_10 | False | False | False | NM9_10_70189477 | False | POTENTIALLY_RETAINED_MACIES_OUTIES |
1025452 | 70189477 | scaffold_024_with_IES | 688207 | 0 | A | 5 | 1.655 | 0.565 | 1.391 | 1.190 | 18 | True | NaN | ATAGAAATTAAAATCAGCTGCTTAA | NM9_10 | False | False | False | NM9_10_70189477 | False | POTENTIALLY_RETAINED_MACIES_OUTIES |
I defined the following function
def get_distance_from_nearest_criteria(df,criteria):
begins = df[df[criteria]].copy()
if len(begins) == 0:
return pd.Series([np.nan for x in range(len(df))])
else:
list_return = []
for idx, nt in df.iterrows():
distances = [abs(nt["tpl"] - x) for x in begins["tpl"]]
mindistance = min(distances,default=np.nan)
list_return.append(mindistance)
return pd.Series(list_return)
Then using :
from pandarallel import pandarallel
pandarallel.initialize(progress_bar=False, nb_workers=12)
out = df.groupby(["uniqueID"]).parallel_apply(lambda x: get_distance_from_nearest_criteria(x,'isbegin_bondary'))
leads to :
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-49-02fc7c0589e3> in <module>
----> 1 out = df.groupby(["uniqueID"]).parallel_apply(lambda x: get_distance_from_nearest_criteria(x,'isbegin_bondary'))
~/conda3/envs/ies/lib/python3.8/site-packages/pandarallel/pandarallel.py in closure(data, func, *args, **kwargs)
463 )
464
--> 465 return reduce(results, reduce_meta_args)
466
467 finally:
~/conda3/envs/ies/lib/python3.8/site-packages/pandarallel/data_types/dataframe_groupby.py in reduce(results, df_grouped)
14 keys, values, mutated = zip(*results)
15 mutated = any(mutated)
---> 16 return df_grouped._wrap_applied_output(
17 keys, values, not_indexed_same=df_grouped.mutated or mutated
18 )
TypeError: _wrap_applied_output() missing 1 required positional argument: 'values'
For me, the error is not clear enough (I can’t tell what’s happening)
However, when I run it with a simple pandas apply :
uniqueID
HT2_10354935 0 297.0
1 297.0
2 296.0
3 296.0
4 295.0
...
NM9_10_9568952 502 NaN
503 NaN
504 NaN
505 NaN
506 NaN
Length: 1028437, dtype: float64
I’m running all of this in a jupyter notebook
ipykernel 5.3.4 ipython 7.22.0 ipython-genutils 0.2.0 notebook 6.4.0 jupyter 1.0.0 jupyter-client 6.1.12 jupyter-console 6.4.0 jupyter-core 4.7.1 jupyter-dash 0.4.0 jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0
I was wondering if someone could explain me what’s hapenning, and how to fix it if the error is mine. Because it works out of the box with a simple pandas apply, I suppose that there is a small problem in pandarallel
NB: Note also that this code leaves unkilled processes even after I interrupted or restarted the ipython kernel EDIT: Would it be linked to the fact that I’m using a lambda function ?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 8
- Comments: 18 (5 by maintainers)
One solution is that you could use the pandas version before v1.3.0, for example v1.2.5.
For version after v1.3.0, the _wrap_applied_output function inside pandas/core/groupby/groupby.py add one positional argument
data
, therefore, causing this problem.I have the excatly same problem with a normal defined function. And no idea to fix it that seems like a bug in pandarallel.
This is the commit where it changed https://github.com/pandas-dev/pandas/commit/3408a61ff940900ed1aa5fb89ee92635938d2e94
So downgrading to <1.3 will “fix” it
I am also having this issue
@Kr4t0n OIC, I met this issue just after a upgrading but this confliction is caused from a core library’s dependency.
This issue still exists in 1.5.4 with pandas 1.4.1.