pandas: BUG: groupby().ffill() adds group labels as extra column

Code Sample, a copy-pastable example if possible

Input:

pd.DataFrame(0, [1,2], [3,4]).groupby([5, 6]).ffill()

Output:

   NaN  3  4
1    5  0  0
2    6  0  0

Problem description

groupby().ffill() adds an additional column to the dataframe, containing a copy of the group labels. This is a regression in pandas v0.23.0 (#19673).

Expected Output

   3  4
1  0  0
2  0  0

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.14.14-200.fc26.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C LANG: C LOCALE: None.None

pandas: 0.23.1 pytest: 3.6.0 pip: 10.0.1 setuptools: 39.2.0 Cython: 0.28.3 numpy: 1.13.3 scipy: 0.19.1 pyarrow: None xarray: None IPython: 4.2.1 sphinx: None patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.2 feather: None matplotlib: 2.2.2 openpyxl: None xlrd: 1.1.0 xlwt: None xlsxwriter: None lxml: None bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.8 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: 0.1.5 pandas_gbq: None pandas_datareader: None

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 22 (18 by maintainers)

Commits related to this issue

Most upvoted comments

Looks like the issue also occurs when grouping by an index level:

>>> pd.DataFrame([[1,np.nan,np.nan,np.nan]],[0],[1,1,2,2]).T.groupby(level=0).ffill()
   NaN    0
1    1  1.0
1    1  1.0
2    2  NaN
2    2  NaN

Just as a heads up - this “good first issue” label was added when we thought this was just going to be a test case. I’ve removed it as it appears to be a little more complex than that. Absolutely welcome to diagnose and debug but just want to be clear that it may not be as simple as originally thought

@aggarwalvinayak : By all means! Go for it!

Hey, can i work on this issue? This is my first time i am contributing to an open source project. I have some experience using pandas dataframe