pandas: OSError when reading file with accents in file path

Code Sample, a copy-pastable example if possible

test.txt and test_é.txt are the same file, only the name change:

pd.read_csv('test.txt')
Out[3]: 
   1 1 1
0  1 1 1
1  1 1 1

pd.read_csv('test_é.txt')
Traceback (most recent call last):

  File "<ipython-input-4-fd67679d1d17>", line 1, in <module>
    pd.read_csv('test_é.txt')

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 389, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 730, in __init__
    self._make_engine(self.engine)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 923, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)

  File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 1390, in __init__
    self._reader = _parser.TextReader(src, **kwds)

  File "pandas\parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4184)

  File "pandas\parser.pyx", line 669, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:8471)

OSError: Initializing from file failed

Problem description

Pandas return OSError when trying to read a file with accents in file path.

The problem is new (Since I upgraded to Python 3.6 and Pandas 0.19.2)

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: fr LOCALE: None.None

pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 32.3.1 Cython: 0.25.2 numpy: 1.11.3 scipy: 0.18.1 statsmodels: None xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: None numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999999999 httplib2: None apiclient: None sqlalchemy: 1.1.4 pymysql: None psycopg2: None jinja2: 2.9.3 boto: None pandas_datareader: None

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 1
Comments: 27 (12 by maintainers)

Commits related to this issue

COMPAT: Properly encode filenames in read_csv Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3... — committed to forking-repos/pandas by gfyoung 5 years ago
COMPAT: Properly encode filenames in read_csv Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3... — committed to forking-repos/pandas by gfyoung 5 years ago
COMPAT: Properly encode filenames in read_csv Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3... — committed to forking-repos/pandas by gfyoung 5 years ago
COMPAT: Properly encode filenames in read_csv Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3... — committed to forking-repos/pandas by gfyoung 5 years ago
COMPAT: Properly encode filenames in read_csv Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3... — committed to forking-repos/pandas by gfyoung 5 years ago
COMPAT: Properly encode filenames in read_csv (#24758) Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're u... — committed to pandas-dev/pandas by gfyoung 5 years ago
COMPAT: Properly encode filenames in read_csv (#24758) Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're u... — committed to Pingviinituutti/pandas by gfyoung 5 years ago
COMPAT: Properly encode filenames in read_csv (#24758) Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're u... — committed to Pingviinituutti/pandas by gfyoung 5 years ago
Fix gh-15086 properly instead of making a workaround — committed to anmyachev/pandas by vnlitvinov 5 years ago
Fix gh-15086 properly instead of making a workaround — committed to anmyachev/pandas by vnlitvinov 5 years ago
Fix gh-15086 properly instead of making a workaround — committed to anmyachev/pandas by vnlitvinov 5 years ago
Fix gh-15086 properly instead of making a workaround — committed to anmyachev/pandas by vnlitvinov 5 years ago
Fix gh-15086 properly instead of making a workaround — committed to anmyachev/pandas by vnlitvinov 5 years ago
BUG: reading windows utf8 filenames in py3.6 (#25769) * Fix gh-15086 properly instead of making a workaround * fix code style * Make sure test_filename_with_special_chars properly tests combina... — committed to pandas-dev/pandas by vnlitvinov 5 years ago
BUG: reading windows utf8 filenames in py3.6 (#25769) * Fix gh-15086 properly instead of making a workaround * fix code style * Make sure test_filename_with_special_chars properly tests combination... — committed to anmyachev/pandas by vnlitvinov 5 years ago

Most upvoted comments

If anyone comes here like me because he/she hit the same problem, here is a solution until pandas is fixed to work with pep 529 (basically any non ascii chars will in your path or filename will result in errors):

Insert the following two lines at the beginning of your code to revert back to the old way of handling paths on windows:

import sys
sys._enablelegacywindowsfsencoding()

+28

fotisj on Jan 14, 2018

@jreback what is the next step towards a fix here? You have mentioned a PR that got ‘blown away’ - what does it mean?

While I do not use Windows, I could try to help (just got a VM to debug a piece of my code that apparently does not work on windows)

BTW, a workaround: pass a file handle instead of a name pd.read_csv(open('test_é.txt', 'r')) (there are several workarounds in related issues, but I have not seen this one)

+11

tpietruszka on Aug 23, 2017

my old code (can’t run):

import pandas as pd
import os
file_path='./dict/字典.csv'
df_name = pd.read_csv(file_path,sep=',' )

new code (sucessful):

import pandas as pd
import os
file_path='./dict/dict.csv'
df_name = pd.read_csv(file_path,sep=',' )

I think this bug is filename problem. I change filename from chinese to english, it can run now.

dondon2475848 on Aug 29, 2017

I also faced the same problem when the program stopped at pd.read_csv(file_path). The situation is similar to me after I upgraded my python to 3.6 (I’m not sure the last time the python I installed is exactly what version, maybe 3.5…).

z94624 on Jul 16, 2017

path=os.path.join(‘E:\语料’,‘sina.csv’) pd.read_csv(open(path, ‘r’,encoding=‘utf8’))

It is successful.

GitOffice on Feb 3, 2018