pandas: Bug: rename incapable of accepting tuples as new name
Pandas is incapable of renaming a pandas.Index object with tuples as the new value. Providing a tuple as new_name
in pandas.DataFrame.rename({old_name: new_name}, axis="index")
returns a pandas.MultiIndex
object, and providing it within a singleton tuple returns an undesirable result. See code below (work-around at bottom):…
import pandas as pd
import numpy as np
df = pd.DataFrame(data = np.arange(5), index=[(x, x) for x in range(5)], columns=["Value"])
print(df) # Note that df.index is a pd.Index object of 2-length tuples
# Wish to rename axis label, but keep the same style
df2 = df.rename({(1,1):(1,5)}, axis="index")
print(df2) # Woah! - df2.index is of MultiIndex type
print(df2.index) # ... and here's proof
# Maybe I can get around this by passing it as a singleton tuple...
df3 = df.rename({(1,1):((1,5),)}, axis="index")
print(df3) # ... apparently not
Will produce the output:
Value
(0, 0) 0
(1, 1) 1
(2, 2) 2
(3, 3) 3
(4, 4) 4
Value
0 0 0
1 5 1
2 2 2
3 3 3
4 4 4
MultiIndex(levels=[[0, 1, 2, 3, 4], [0, 2, 3, 4, 5]],
labels=[[0, 1, 2, 3, 4], [0, 4, 1, 2, 3]])
Value
(0, 0) 0
((1, 5),) 1
(2, 2) 2
(3, 3) 3
(4, 4) 4
Desired/Expected output:
Value
(0, 0) 0
(1, 5) 1
(2, 2) 2
(3, 3) 3
(4, 4) 4
Problem description
The current behaviour is a problem for two reasons:
- It is un-intuitive - I can’t see why a user would expect renaming an index to change the index’s type.
- There is no way rename Index objects with tuples
I have checked for similar issues by search of the word rename
, and at time of writing, pandas 0.22.0 is the latest released version.
Output of pd.show_versions()
pandas: 0.22.0 pytest: 3.0.3 pip: 9.0.1 setuptools: 28.8.0 Cython: 0.25.1 numpy: 1.11.2 scipy: 0.18.1 pyarrow: None xarray: None IPython: 5.1.0 sphinx: 1.4.8 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2016.7 blosc: None bottleneck: 1.1.0 tables: 3.3.0 numexpr: 2.6.1 feather: None matplotlib: 1.5.3 openpyxl: 2.4.9 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.3 lxml: 3.8.0 bs4: 4.5.1 html5lib: 1.0b10 sqlalchemy: 1.1.3 pymysql: None psycopg2: None jinja2: 2.8 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Workaround
The workaround below uses set_value
function which the documentation tells the user to avoid using (unless you really know what you’re doing):
df.index.set_value(df.index.get_values(), (1,1), (1, 5))
df.reset_index(inplace=True)
df.set_index("index", inplace=True)
df.index.name = None # Arguably not necessary...
print(df)
Produces the output:
Value
(0, 0) 0
(1, 5) 1
(2, 2) 2
(3, 3) 3
(4, 4) 4
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 26 (26 by maintainers)
Commits related to this issue
- BUG: #19497 fix. — committed to charlie0389/pandas by charlie0389 6 years ago
- DOC: Include documentation in response to bug #19497 fix. — committed to charlie0389/pandas by charlie0389 6 years ago
- Fix COMBED dataset converter for pandas>=0.23 (due to the bugfix pandas-dev/pandas#19497). Still compatible with 0.22 by using `pd.MultiIndex` directly. — committed to nilmtk/nilmtk by PMeira 6 years ago
- Fix COMBED dataset converter for pandas>=0.23 (due to the bugfix pandas-dev/pandas#19497). Still compatible with 0.22 by using `pd.MultiIndex` directly. — committed to BaluJr/energytk by PMeira 6 years ago
Understood. That is a valid concern…
I think converting between types (numeric vs. Index, etc.) is fine. It’s the conversion between multi vs. flat that we (maybe) want to disallow via
.rename
.Uhm, no, that PR is unrelated. And I was probably just confused.
I still think this is going to be fixed… sooner or later.