pandas: MultiIndex `to_string` edge case Error after 0.23.0 upgrade

Code example

import pandas as pd
import numpy as np

index = pd.date_range('1970', '2018', freq='A')
data = np.random.randn(len(index))
columns1 = [
    ['This is a long title with > 37 chars.'],
    ['cat'],
]
columns2 = [
    ['This is a loooooonger title with > 43 chars.'],
    ['dog'],
]
df1 = pd.DataFrame(data=data, index=index, columns=columns1)
df2 = pd.DataFrame(data=data, index=index, columns=columns2)
df = pd.concat([df1, df2], axis=1)
df.head()

Output (using pandas 0.23.0)

>>> df.head()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/base.py", line 82, in __repr__
    return str(self)
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/base.py", line 61, in __str__
    return self.__unicode__()
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/frame.py", line 663, in __unicode__
    line_width=width, show_dimensions=show_dimensions)
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/frame.py", line 1968, in to_string
    formatter.to_string()
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/io/formats/format.py", line 648, in to_string
    strcols = self._to_str_columns()
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/io/formats/format.py", line 539, in _to_str_columns
    str_columns = self._get_formatted_column_labels(frame)
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/io/formats/format.py", line 782, in _get_formatted_column_labels
    str_columns = _sparsify(str_columns)
  File "/home/david/.virtualenvs/thegrid-py3-venv/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 2962, in _sparsify
    prev = pivoted[start]
IndexError: list index out of range

Problem description

After upgrading Pandas 0.22.0 to 0.23.0 I have experienced the above error. I have noticed that it is the length of the column values, This is a long title with > 37 chars. and This is a loooooonger title with > 43 chars., that makes the difference. If I tweak the combined length of these to be <= 80 characters, there is no error, and output is as expected.

Expected Output (using pandas 0.22.0)

>>> df.head()
           This is a long title with > 37 chars.  \
                                             cat   
1970-12-31                             -1.448415   
1971-12-31                              0.081324   
1972-12-31                             -0.018105   
1973-12-31                              0.902790   
1974-12-31                              0.668474   

           This is a loooooonger title with > 43 chars.  
                                                    dog  
1970-12-31                                    -1.448415  
1971-12-31                                     0.081324  
1972-12-31                                    -0.018105  
1973-12-31                                     0.902790  
1974-12-31                                     0.668474

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_ZA.UTF-8 LOCALE: en_ZA.UTF-8

pandas: 0.23.0 pytest: None pip: 10.0.1 setuptools: 32.3.1 Cython: None numpy: 1.14.0 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2018.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: 2.5.3 xlrd: None xlwt: None xlsxwriter: 1.0.4 lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.4 (dt dec pq3 ext lo64) jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 25 (22 by maintainers)

Most upvoted comments

If we don’t have a fix for this, I would consider reverting the pandas.options.display.max_columns back to 20, and work on fixing this and possibly turning back to 0 for 0.24.0.

Errors in the repr are really annoying, as you cannot even inspect the data properly to see what might be the reason something is not working.