pandas: Bug: rename incapable of accepting tuples as new name

Pandas is incapable of renaming a pandas.Index object with tuples as the new value. Providing a tuple as new_name in pandas.DataFrame.rename({old_name: new_name}, axis="index") returns a pandas.MultiIndex object, and providing it within a singleton tuple returns an undesirable result. See code below (work-around at bottom):…

import pandas as pd
import numpy as np
df = pd.DataFrame(data = np.arange(5), index=[(x, x) for x in range(5)], columns=["Value"])
print(df) # Note that df.index is a pd.Index object of 2-length tuples

# Wish to rename axis label, but keep the same style
df2 = df.rename({(1,1):(1,5)}, axis="index") 

print(df2)  # Woah! - df2.index is of MultiIndex type
print(df2.index) # ... and here's proof

# Maybe I can get around this by passing it as a singleton tuple...
df3 = df.rename({(1,1):((1,5),)}, axis="index") 
print(df3) # ... apparently not

Will produce the output:

        Value
(0, 0)      0
(1, 1)      1
(2, 2)      2
(3, 3)      3
(4, 4)      4

     Value
0 0      0
1 5      1
2 2      2
3 3      3
4 4      4
MultiIndex(levels=[[0, 1, 2, 3, 4], [0, 2, 3, 4, 5]],
           labels=[[0, 1, 2, 3, 4], [0, 4, 1, 2, 3]])

           Value
(0, 0)         0
((1, 5),)      1
(2, 2)         2
(3, 3)         3
(4, 4)         4

Desired/Expected output:

        Value
(0, 0)      0
(1, 5)      1
(2, 2)      2
(3, 3)      3
(4, 4)      4

Problem description

The current behaviour is a problem for two reasons:

  1. It is un-intuitive - I can’t see why a user would expect renaming an index to change the index’s type.
  2. There is no way rename Index objects with tuples

I have checked for similar issues by search of the word rename, and at time of writing, pandas 0.22.0 is the latest released version.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-112-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.22.0 pytest: 3.0.3 pip: 9.0.1 setuptools: 28.8.0 Cython: 0.25.1 numpy: 1.11.2 scipy: 0.18.1 pyarrow: None xarray: None IPython: 5.1.0 sphinx: 1.4.8 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2016.7 blosc: None bottleneck: 1.1.0 tables: 3.3.0 numexpr: 2.6.1 feather: None matplotlib: 1.5.3 openpyxl: 2.4.9 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.3 lxml: 3.8.0 bs4: 4.5.1 html5lib: 1.0b10 sqlalchemy: 1.1.3 pymysql: None psycopg2: None jinja2: 2.8 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Workaround

The workaround below uses set_value function which the documentation tells the user to avoid using (unless you really know what you’re doing):

df.index.set_value(df.index.get_values(), (1,1), (1, 5)) 
df.reset_index(inplace=True)
df.set_index("index", inplace=True)
df.index.name = None # Arguably not necessary...
print(df)

Produces the output:

        Value
(0, 0)      0
(1, 5)      1
(2, 2)      2
(3, 3)      3
(4, 4)      4

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 26 (26 by maintainers)

Commits related to this issue

Most upvoted comments

assumed the following is a reasonable way to create a MultiIndex

Understood. That is a valid concern…

although implementation wise this can be decoupled from the issue of multi vs. flat

I think converting between types (numeric vs. Index, etc.) is fine. It’s the conversion between multi vs. flat that we (maybe) want to disallow via .rename.

I might have fixed this somewhere… maybe #18600 .

Uhm, no, that PR is unrelated. And I was probably just confused.

I still think this is going to be fixed… sooner or later.