pandas: BUG: specifying fill_value in pandas.DataFrame.shift() messes with index of empty dataframes

[ x] I have checked that this issue has not already been reported.
[ x] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

# create empty df, and set a multi index
empty_df = pd.DataFrame(columns=["a", "b", "c"])
multi_index_empty_df = empty_df.set_index(keys=["a", "b"])

# index is different when applying shift on grouping when specifying a fill value
assert multi_index_empty_df.groupby(["a","b"]).shift(1).index.names == multi_index_empty_df.groupby(["a","b"]).shift(1, fill_value=0).index.names

Problem description

I discovered this problem when a unit test failed on a function that performed a groupby and shift on an empty dataframe.

Specifying fill_value in pandas.DataFrame.shift() should not alter the index that was set on a dataframe. This is not the case for empty dataframes with a multi-index, as the example above shows.

Expected Output

When executing:

multi_index_empty_df.groupby(["a","b"]).shift(1).index.names

I get the output:

FrozenList(['a', 'b'])

But when executing

multi_index_empty_df.groupby(["a","b"]).shift(1, fill_value=0).index.names

I should get the same output, but instead I get

FrozenList([None])

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.7.1.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.5 numpy : 1.19.0 pytz : 2018.7 dateutil : 2.7.5 pip : 18.1 setuptools : 40.6.3 Cython : 0.29.2 pytest : 4.0.2 hypothesis : None sphinx : 1.8.2 blosc : None feather : None xlsxwriter : 1.1.2 lxml.etree : 4.2.5 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.19.0 pandas_datareader: None bs4 : 4.6.3 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.2.5 matplotlib : 3.2.2 numexpr : 2.6.8 odfpy : None openpyxl : 2.5.12 pandas_gbq : None pyarrow : None pytables : None pytest : 4.0.2 pyxlsb : None s3fs : None scipy : 1.5.1 sqlalchemy : 1.2.15 tables : 3.4.4 tabulate : None xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.1.2 numba : 0.41.0

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 17 (7 by maintainers)

Most upvoted comments

I did not create a unittest. I just tested in a notebook and since the problem was nonexistent didn’t go further. I will create a unittes, submit it and link it here.

benbogart on May 12, 2021

Thanks for the report @brunocous. This works on master (and latest pandas version (1.2.4)), but could probably use a test.

mzeitlin11 on May 2, 2021