pandas: BUG: freq set to NONE when resampling pd.Multiindex (introduced in v1.1.0)

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Update/Summary

When dealing with a pd.multiindex the frequency df.loc[i].index.freq has not been set. This behavior has been introduced in v1.1.0

idx = pd.Index(range(2), name="A")
dti = pd.date_range("2020-01-01", periods=7, freq="D", name="B")
mi = pd.MultiIndex.from_product([idx, dti])

df = pd.DataFrame(np.random.randn(14, 2), index=mi)

>>> df.loc[0].index

df.loc[0].index matches dti, so it would be nice to get the freq="D" back on that.

using code sample in #35563 (comment) points to #31315

c988567 is the first bad commit commit c988567 Author: jbrockmendel jbrockmendel@gmail.com Date: Sun Feb 9 07:01:20 2020 -0800

REF: tighten what we accept in TimedeltaIndex._simple_new (#31315)

Original post

Code Sample, a copy-pastable example

import numpy as np
import pandas as pd

idx_level_0 = np.repeat([1, 2], 5)
dates = np.tile(
    ["2020-01-01", "2020-01-02", "2020-01-04", "2020-01-06", "2020-01-07"], 2
)
values1 = [1, 2, 3, 4, 5]
values2 = [6, 7, 8, 9, 10]

df = pd.DataFrame(
    {"idx_level_0": idx_level_0, "dates": dates, "values": [*values1, *values2]}
)
df["dates"] = pd.to_datetime(df["dates"])
df = df.set_index(["idx_level_0", "dates"], drop=True)

df = df.groupby("idx_level_0").resample("D", level="dates").last()

# The following assertion is working properly in pandas v1.0.5
# It throws an error in pandas v1.1.0
assert df.index.get_level_values(1).freq == "D"

Problem description

When resampling a groupby-object, the frequency will incorrectly be set to None.

Expected Output

The frequency should be set according to the resampled frequency.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff2792bf16178d4e450fe7384244e50635733 python : 3.8.2.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.cp1252 pandas : 1.1.0 numpy : 1.18.4 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 45.3.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.5.1 html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.2.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : 1.5.0 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 21 (18 by maintainers)

Most upvoted comments

strong maybe

@TomAugspurger is right about the distinction between get_level_values().freq and levels[1].freq.

That said, I think there is a behavior that can be improved:

idx = pd.Index(range(2), name="A")
dti = pd.date_range("2020-01-01", periods=7, freq="D", name="B")
mi = pd.MultiIndex.from_product([idx, dti])

df = pd.DataFrame(np.random.randn(14, 2), index=mi)

>>> df.loc[0].index

df.loc[0].index matches dti, so it would be nice to get the freq="D" back on that.