pandas: BUG: Change of behavior in casting of datetime-like types in MultiIndex
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
[Edited to inform a much simpler example.]
import datetime
import pandas as pd
print(f"Pandas version:\t{pd.__version__}\n")
df = pd.DataFrame({'date': [datetime.date(2021, 8, 1),
datetime.date(2021, 8, 2),
datetime.date(2021, 8, 3)],
'ticker': ['aapl', 'goog', 'yhoo'],
'value': [5.63269, 4.45609, 2.74843]})
df.set_index(['date', 'ticker'], inplace=True)
print(df.index.get_level_values(0))
Output
The output below has been generated with pandas 1.3.0 or higher.
Pandas version: 1.3.0
Index([2021-08-01, 2021-08-02, 2021-08-03], dtype='object', name='date')
Expected Output
The output below has been generated with pandas 1.2.5.
Pandas version: 1.2.5
DatetimeIndex(['2021-08-01', '2021-08-02', '2021-08-03'], dtype='datetime64[ns]', name='date', freq=None)
Problem description
Starting from pandas 1.3.0, the observed behavior changed: in a MultiIndex creation, datetime.date objects are not cast to datetime64 anymore. I fail to find in the What’s new page the reason for that change of behavior. Is it by design or a bug?
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : 5f648bf1706dd75a9ca0d29f26eadfbb595fe52b
python : 3.9.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Tue Jun 22 19:49:55 PDT 2021; root:xnu-6153.141.35~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.2
numpy : 1.21.2
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 57.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : None
IPython : 7.26.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.4.22
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 17 (11 by maintainers)
Commits related to this issue
- code sample for #43091 — committed to simonjayhawkins/pandas by simonjayhawkins 3 years ago
I’d go for
lib.infer_dtype(col, skipna=True) == "date"instead of checking for “mixed”It’s possible. Though we’d then have a breaking change for anyone relying on the 1.3 behavior.
I’d check
Index(col).inferred_type == "date"in 1.2.5…
gives
and using the DataFrame from the OP
gives
whereas for a MultiIndex
gives
So the Index and MultiIndex constructors were inconsistent in the handling of object dtype arrays containing datetime objects in pandas 1.2.5.
The change of behavior in casting of datetime-like types in MultiIndex was done in #38552. Looking at the code changes in that PR, it is clear from the changed tests and comments added that this change was intentional. Unfortunately the release note added did not refer to changes in MultiIndex construction.
The policy also states
So the change in behavior could be considered a bug fix, since the MultiIndex constructor was inconsistent with the Index constructor and no further action.
However, the policy also states
and
So, as an alternative, we could maybe restore the old behavior for 1.3.5 and add a deprecation of this behavior in 1.4
The only code change in #38552 was removing
convert_dates=Truefromvalues = maybe_infer_to_datetimelike(values, convert_dates=True)I guess we could maybe pass a convert_dates parameter through to the Categorical constructor from the MultiIndex constructor. @jbrockmendel wdyt?
@jgmarcel would likely take a community pull request
core can provide review
might be quite tricky as date have very little support
Thanks @jgmarcel for the report.
first bad commit: [545a942424a26c4163e1f959ac6130984fc3fb41] BUG: Index([date]).astype(“category”).astype(object) roundtrip (#38552)
I’ll mark as a regression for now pending further investigation.
Note that the
set_indexfollowed by areset_indexstill creates adatetime64[ns]column from the originalobjectcolumn of date objects.cc @jbrockmendel