pandas: Warn on duplicate names in MI?

Opening a new issue so this isn’t lost.

In https://github.com/pandas-dev/pandas/pull/18882 banned duplicate names in a MultiIndex. I think this is a good change since allowing duplicates hit a lot of edge cases when you went to actually do something. I want to make sure we understand all the cases that actually produce duplicate names in the MI though, specifically groupby.apply.

In [1]: import dask.dataframe as dd

In [2]: import pandas as pd

In [3]:     pdf = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
   ...:                         'b': [4, 5, 6, 3, 2, 1, 0, 0, 0]},
   ...:                        index=[0, 1, 3, 5, 6, 8, 9, 9, 9]).set_index("a")
   ...:
   ...:

In [4]: pdf.groupby(pdf.index).apply(lambda x: x.b)

Another, more realistic example: groupwise drop_duplicates:

In [18]: df = pd.DataFrame({"B": [0, 0, 0, 1, 1, 1, 2, 2, 2]}, index=pd.Index([0, 1, 1, 2, 2, 2, 0, 0, 1], name='a'))

In [19]: df
Out[19]:
   B
a
0  0
1  0
1  0
2  1
2  1
2  1
0  2
0  2
1  2

In [20]: df.groupby('a').apply(pd.DataFrame.drop_duplicates)
Out[20]:
     B
a a
0 0  0
  0  2
1 1  0
  1  2
2 2  1

Is it possible to throw a warning on this for now, in case duplicate names are more common than we thought?

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 18 (18 by maintainers)

Commits related to this issue

add fix for bug #19029 As of version 0.23.0 MultiIndex throws an exception in case it contains duplicated level names. This can happen as a result of various groupby operations (#21075). This commit ... — committed to guenteru/pandas by guenteru 6 years ago
add groupby testcase (#19029) — committed to guenteru/pandas by guenteru 6 years ago

Most upvoted comments

Hmm, this was supposed to be done for 0.23, but we missed it.

I still think it’s worthwhile doing for 0.23.1 (cc @guenteru if you have time to make a PR).

TomAugspurger on May 17, 2018