pandas: MultiIndex `to_string` edge case Error after 0.23.0 upgrade
Code example
import pandas as pd
import numpy as np
index = pd.date_range('1970', '2018', freq='A')
data = np.random.randn(len(index))
columns1 = [
['This is a long title with > 37 chars.'],
['cat'],
]
columns2 = [
['This is a loooooonger title with > 43 chars.'],
['dog'],
]
df1 = pd.DataFrame(data=data, index=index, columns=columns1)
df2 = pd.DataFrame(data=data, index=index, columns=columns2)
df = pd.concat([df1, df2], axis=1)
df.head()
Output (using pandas 0.23.0)
>>> df.head()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/base.py", line 82, in __repr__
return str(self)
File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/base.py", line 61, in __str__
return self.__unicode__()
File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/frame.py", line 663, in __unicode__
line_width=width, show_dimensions=show_dimensions)
File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/frame.py", line 1968, in to_string
formatter.to_string()
File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/io/formats/format.py", line 648, in to_string
strcols = self._to_str_columns()
File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/io/formats/format.py", line 539, in _to_str_columns
str_columns = self._get_formatted_column_labels(frame)
File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/io/formats/format.py", line 782, in _get_formatted_column_labels
str_columns = _sparsify(str_columns)
File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 2962, in _sparsify
prev = pivoted[start]
IndexError: list index out of range
Problem description
After upgrading Pandas 0.22.0 to 0.23.0 I have experienced the above error. I have noticed that it is the length of the column values, This is a long title with > 37 chars.
and This is a loooooonger title with > 43 chars.
, that makes the difference. If I tweak the combined length of these to be <= 80 characters, there is no error, and output is as expected.
Expected Output (using pandas 0.22.0)
>>> df.head()
This is a long title with > 37 chars. \
cat
1970-12-31 -1.448415
1971-12-31 0.081324
1972-12-31 -0.018105
1973-12-31 0.902790
1974-12-31 0.668474
This is a loooooonger title with > 43 chars.
dog
1970-12-31 -1.448415
1971-12-31 0.081324
1972-12-31 -0.018105
1973-12-31 0.902790
1974-12-31 0.668474
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_ZA.UTF-8 LOCALE: en_ZA.UTF-8
pandas: 0.23.0 pytest: None pip: 10.0.1 setuptools: 32.3.1 Cython: None numpy: 1.14.0 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2018.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: 2.5.3 xlrd: None xlwt: None xlsxwriter: 1.0.4 lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.4 (dt dec pq3 ext lo64) jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 25 (22 by maintainers)
If we don’t have a fix for this, I would consider reverting the
pandas.options.display.max_columns
back to 20, and work on fixing this and possibly turning back to 0 for 0.24.0.Errors in the repr are really annoying, as you cannot even inspect the data properly to see what might be the reason something is not working.