pandas: pandas 1.0.1 read_csv() is broken for some file-like objects

Code Sample

import os
import pandas
import tempfile
import traceback

# pandas.show_versions()

fname = ''
with tempfile.NamedTemporaryFile(delete=False) as f:
    f.write('てすと\nこむ'.encode('shift-jis'))
    f.seek(0)
    fname = f.name

    try:
        result = pandas.read_csv(f, encoding='shift-jis')
        print('read shift-jis')
        print(result)

    except Exception as e:
        print(e)
        print(traceback.format_exc())

os.unlink(fname)

Problem description

Pandas 1.0.1, this sample does not work. But pandas 0.25.3, this sample works fine. As stated in issue #31575, the encode of file-like object is ignored when its class is not io.BufferedIOBase neither RawIOBase. However, some file-like objects are NOT inherited one of them, although the “actual” inner object is one of them. In this code sample case, according to the cpython implementation, they has file as their attribute self.file = file, and __getattr__() returns the file’s attribute as their attribute. So the code is not work. The identic problems are in other file-like objects, for example, tempfile.*File class, werkzeug’s FileStorage class, and so on.

Note: I first recognized this problem with using pandas via flask’s posted file. The file-like object is an instance of werkzeug’s FileStorage. I avoided this problem with following code:

pandas.read_csv(request.files['file'].stream._file, encoding='shift-jis')

Expected Output

read shift-jis
  てすと
0  こむ

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.6.8.final.0 python-bits : 64 OS : Linux OS-release : 4.14.138-89.102.amzn1.x86_64 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : ja_JP.UTF-8 LOCALE : ja_JP.UTF-8

pandas : 1.0.1 numpy : 1.18.1 pytz : 2019.3 dateutil : 2.8.1 pip : 9.0.3 setuptools : 36.2.7 Cython : None pytest : 3.6.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.0.5 lxml.etree : 4.2.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10 IPython : None pandas_datareader: None bs4 : 4.6.0 bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.2.1 matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 3.6.2 pyxlsb : None s3fs : None scipy : None sqlalchemy : 1.3.4 tables : None tabulate : None xarray : None xlrd : 1.1.0 xlwt : None xlsxwriter : 1.0.5 numba : None

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 22 (15 by maintainers)

Commits related to this issue

Most upvoted comments

I’m not sure why @Colin-b didn’t follow up here, but I think this indicates an issue with Pandas as well. Pandas should accept w+ as readable. I don’t know enough about Pandas to say though, so I’m not opening a new issue.

@sasanquaneuf : You are more than welcome to give your suggestion a try!