pandarallel: pandarallel_apply crashes with OverflowError: int too big to convert

Hi everyone,

I am getting this error here using parallel_apply in pandas:

  File "extract_specifications.py", line 156, in <module>
    extracted_data = df.parallel_apply(extract_raw_infos, axis=1)
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/pandarallel.py", line 367, in closure
    kwargs,
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/pandarallel.py", line 239, in get_workers_args
    zip(input_files, output_files, chunk_lengths)
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/pandarallel.py", line 238, in <listcomp>
    for index, (input_file, output_file, chunk_length) in enumerate(
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/pandarallel.py", line 169, in wrapper
    time=time,
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/utils/inliner.py", line 34, in wrapper
    return function(*args, **kwargs)
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/utils/inliner.py", line 464, in inline
    func_instructions, len(b"".join(pinned_pre_func_instructions_without_return))
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/utils/inliner.py", line 34, in wrapper
    return function(*args, **kwargs)
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/utils/inliner.py", line 314, in shift_instructions
    for instruction in instructions
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/utils/inliner.py", line 314, in <genexpr>
    for instruction in instructions
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/utils/inliner.py", line 34, in wrapper
    return function(*args, **kwargs)
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/utils/inliner.py", line 293, in shift_instruction
    return bytes((operation,)) + int2python_bytes(python_ints2int(values) + qty)
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/utils/inliner.py", line 34, in wrapper
    return function(*args, **kwargs)
  File "/home/tom/.local/lib/python3.6/site-packages/pandarallel/utils/inliner.py", line 71, in int2python_bytes
    return int.to_bytes(item, nb_bytes, "little")
OverflowError: int too big to convert

I am using

pandarallel == 1.4.2
pandas == 0.24.2
python == 3.6.9

Any idea how to proceed from here? I have basically no idea what could cause this bug. I suspect it might be related to the size of the data I have in one column (I save html from web pages in there). But otherwise no idea. I would help removing this bug(?) if I had some guidance here. Thx for helping.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 23
  • Comments: 21 (4 by maintainers)

Commits related to this issue

Most upvoted comments

same here, any way to make it work with progress bar?

Same error also. It woud be nice to have the progress bar.

just edit PYTHON_PATH/python3.6/site-packages/pandarallel/utils/inliner.py line 71

retrun int.to_bytes(item%(1<<nb_bytes*8), nb_bytes, "little")

Are there any updates for this issue? I have the same problem and can’t solve it.

Same Problem

i check my apply function work correctly when progress_bar is set False. It seems to be something relating to the version stuff. Pls fix it

Could you please try without the progress bar and/or with only a small part of your dataset and tell me the result ?

Could you also print the result of : len(df) ?

thx. So without the progress bar it seemed to work. Thx for the hint. As it is a quiet long computation it would be nice to have it though…

here is some more information about my dataset. it includes a lot of python objects holding quiet large strings (largest one being html_content)

len(df) == 4737

# columns: 
df.shape = (4737, 21)

# memory consumption of columns
df.memory_usage(deep=True) == 
Index             201736
url               524872
depth              37896
counter            37896
time              336327
filename          486537
encoding          297990
file_path         548118
tables            246232
main_content    13186430
html_content    22344093
language          279483
lists             243120
list_score         37896
description     16791103
table0d           227376
table1d           259776
tablexd           232944
table_len          37896
table_num          37896
table_score        37896
co_score           37896