pandas: BUG: read_excel with multi-indexed column ignores index_col=None
From SO: http://stackoverflow.com/questions/34020061/excel-to-pandas-dataframe-using-first-column-as-index
@chris-b1 another one on the multi-index excel issues … 😃
Small test case: content of excel file:
A | A | B | B |
---|---|---|---|
key | val | key | val |
1 | 2 | 3 | 4 |
1 | 2 | 3 | 4 |
gives:
In [2]: pd.read_excel("test_excel_index_col.xlsx", header=[0,1], index_col=None)
Out[2]:
A A B
key val key val
1 2 3 4
1 2 3 4
It’s not super clear in the formatting of the dataframe, but the [1, 1] is the index and [A, key] are seen as the level names of the multi-indexed columns.
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Reactions: 5
- Comments: 18 (9 by maintainers)
Commits related to this issue
- BUG: Don't extract header names if none specified Closes gh-11733. — committed to forking-repos/pandas by gfyoung 6 years ago
- BUG: Don't extract header names if none specified Closes gh-11733. — committed to forking-repos/pandas by gfyoung 6 years ago
- BUG: Don't extract header names if none specified (#23703) Closes gh-11733. — committed to pandas-dev/pandas by gfyoung 6 years ago
- BUG: Don't extract header names if none specified (#23703) Closes gh-11733. — committed to tm9k1/pandas by gfyoung 6 years ago
- BUG: Don't extract header names if none specified (#23703) Closes gh-11733. — committed to Pingviinituutti/pandas by gfyoung 6 years ago
- BUG: Don't extract header names if none specified (#23703) Closes gh-11733. — committed to Pingviinituutti/pandas by gfyoung 6 years ago
Vote for index_col=False to fix this
As of now I still see the same issue. when using multi headers with read_excel, pandas always assigns the first column as index.
Per review comment from @jreback here is proposed api for read_excel. It is the same as read_csv.
Updates are here:
https://github.com/stephenrauch/pandas/commit/9b37ff94643296d489498138c79bb0244aaa3f79
So if this proposed API looks ok, I will do the PR.