pandas: BUG: read_excel with multi-indexed column ignores index_col=None

From SO: http://stackoverflow.com/questions/34020061/excel-to-pandas-dataframe-using-first-column-as-index

@chris-b1 another one on the multi-index excel issues … 😃

Small test case: content of excel file:

A A B B
key val key val
1 2 3 4
1 2 3 4

gives:

In [2]: pd.read_excel("test_excel_index_col.xlsx", header=[0,1], index_col=None)
Out[2]:
A     A    B
key val key  val
1     2    3   4
1     2    3   4

It’s not super clear in the formatting of the dataframe, but the [1, 1] is the index and [A, key] are seen as the level names of the multi-indexed columns.

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Reactions: 5
  • Comments: 18 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Vote for index_col=False to fix this

As of now I still see the same issue. when using multi headers with read_excel, pandas always assigns the first column as index.

Per review comment from @jreback here is proposed api for read_excel. It is the same as read_csv.

index_col : int or sequence or False, default None

Column (0-indexed) to use as the row labels of the DataFrame. If a sequence is given, those columns will be combined into a MultiIndex. If None (default), pandas will use the first column as the index. If False, force pandas to not use the first column as the index (row names).

Updates are here:

https://github.com/stephenrauch/pandas/commit/9b37ff94643296d489498138c79bb0244aaa3f79

So if this proposed API looks ok, I will do the PR.