pandas: read_excel throws ValueError: cannot specify usecols when specifying a multi-index header

df=pd.read_excel(xl_file, sheet_name=xl_sheet,header=[18,19], usecols='B,D:J,L')

Problem description

df=pd.read_excel(xl_file, sheet_name=xl_sheet,header=[18,19], usecols=‘B,D:J,L’)

throws the following error after conda update:

ValueError: cannot specify usecols when specifying a multi-index header

The script run perfectly fine before.

Expected Output

Dataframe with multi-index columns, as before.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.7.1.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.24.1 pytest: 3.3.2 pip: 19.0.3 setuptools: 40.8.0 Cython: 0.27.3 numpy: 1.15.4 scipy: None pyarrow: None xarray: None IPython: 7.3.0 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.8.0 pytz: 2018.9 blosc: None bottleneck: 1.2.1 tables: None numexpr: None feather: None matplotlib: 2.1.2 openpyxl: 2.4.10 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml.etree: None bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.1 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 4
  • Comments: 25 (7 by maintainers)

Most upvoted comments

I also got the same issue

I’ve been working on this issue, since I have encountered the same problem.

I have tested with the data provided in this thread as well as the data I’m using (which, sadly, I cannot share) and it looks like just deleting the check of usecols+multi-index in the parser is enough for it to work.

I’ve been trying to pinpoint where usecols is ultimately used to check if there is really an incompatibility with a multi-index but as far as I have been able to check, there is none. When you call read_excel, the usecols parameters is passed along and finally used by PythonParser._infer_columns(), which does the correct thing: it reads the multi-index considering only the columns specified by usecols and then constructs the levels.

The code that originally introduced the parameter incompatibility check is from 9 years ago (commit a9a89f89e13cf006f6b58da1747aa65f86f74cfb, which states “TST: disallow names, usecols, non-numeric in index_cols”), so maybe it is not a problem anymore? I’m not comfortable just deleting this check without being absolutely sure that I’m not breaking anything, but I’m afraid that I’m not familiarized enough with the pandas internals to be sure (and I’m sure that multi-index must be a horrible beast).

If anyone more knowledgeable than me can confirm that the check is not needed anymore I can go ahead and submit the PR, along with a test to ensure it works correctly.

this bug still exist on version 1.2.4 …