pandas: OSError when reading file with accents in file path
Code Sample, a copy-pastable example if possible
test.txt
and test_é.txt
are the same file, only the name change:
pd.read_csv('test.txt')
Out[3]:
1 1 1
0 1 1 1
1 1 1 1
pd.read_csv('test_é.txt')
Traceback (most recent call last):
File "<ipython-input-4-fd67679d1d17>", line 1, in <module>
pd.read_csv('test_é.txt')
File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 389, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 730, in __init__
self._make_engine(self.engine)
File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 923, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "d:\app\python36\lib\site-packages\pandas\io\parsers.py", line 1390, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas\parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4184)
File "pandas\parser.pyx", line 669, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:8471)
OSError: Initializing from file failed
Problem description
Pandas return OSError when trying to read a file with accents in file path.
The problem is new (Since I upgraded to Python 3.6 and Pandas 0.19.2)
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: fr LOCALE: None.None
pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 32.3.1 Cython: 0.25.2 numpy: 1.11.3 scipy: 0.18.1 statsmodels: None xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: None numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999999999 httplib2: None apiclient: None sqlalchemy: 1.1.4 pymysql: None psycopg2: None jinja2: 2.9.3 boto: None pandas_datareader: None
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 1
- Comments: 27 (12 by maintainers)
Commits related to this issue
- COMPAT: Properly encode filenames in read_csv Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3... — committed to forking-repos/pandas by gfyoung 5 years ago
- COMPAT: Properly encode filenames in read_csv Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3... — committed to forking-repos/pandas by gfyoung 5 years ago
- COMPAT: Properly encode filenames in read_csv Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3... — committed to forking-repos/pandas by gfyoung 5 years ago
- COMPAT: Properly encode filenames in read_csv Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3... — committed to forking-repos/pandas by gfyoung 5 years ago
- COMPAT: Properly encode filenames in read_csv Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're using Python 3... — committed to forking-repos/pandas by gfyoung 5 years ago
- COMPAT: Properly encode filenames in read_csv (#24758) Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're u... — committed to pandas-dev/pandas by gfyoung 5 years ago
- COMPAT: Properly encode filenames in read_csv (#24758) Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're u... — committed to Pingviinituutti/pandas by gfyoung 5 years ago
- COMPAT: Properly encode filenames in read_csv (#24758) Python 3.6+ changes the default encoding to UTF8 (PEP 529), which conflicts with the encoding of Windows (MBCS). This fix checks if we're u... — committed to Pingviinituutti/pandas by gfyoung 5 years ago
- Fix gh-15086 properly instead of making a workaround — committed to anmyachev/pandas by vnlitvinov 5 years ago
- Fix gh-15086 properly instead of making a workaround — committed to anmyachev/pandas by vnlitvinov 5 years ago
- Fix gh-15086 properly instead of making a workaround — committed to anmyachev/pandas by vnlitvinov 5 years ago
- Fix gh-15086 properly instead of making a workaround — committed to anmyachev/pandas by vnlitvinov 5 years ago
- Fix gh-15086 properly instead of making a workaround — committed to anmyachev/pandas by vnlitvinov 5 years ago
- BUG: reading windows utf8 filenames in py3.6 (#25769) * Fix gh-15086 properly instead of making a workaround * fix code style * Make sure test_filename_with_special_chars properly tests combina... — committed to pandas-dev/pandas by vnlitvinov 5 years ago
- BUG: reading windows utf8 filenames in py3.6 (#25769) * Fix gh-15086 properly instead of making a workaround * fix code style * Make sure test_filename_with_special_chars properly tests combination... — committed to anmyachev/pandas by vnlitvinov 5 years ago
If anyone comes here like me because he/she hit the same problem, here is a solution until pandas is fixed to work with pep 529 (basically any non ascii chars will in your path or filename will result in errors):
Insert the following two lines at the beginning of your code to revert back to the old way of handling paths on windows:
@jreback what is the next step towards a fix here? You have mentioned a PR that got ‘blown away’ - what does it mean?
While I do not use Windows, I could try to help (just got a VM to debug a piece of my code that apparently does not work on windows)
BTW, a workaround: pass a file handle instead of a name
pd.read_csv(open('test_é.txt', 'r'))
(there are several workarounds in related issues, but I have not seen this one)my old code (can’t run):
new code (sucessful):
I think this bug is filename problem. I change filename from chinese to english, it can run now.
I also faced the same problem when the program stopped at pd.read_csv(file_path). The situation is similar to me after I upgraded my python to 3.6 (I’m not sure the last time the python I installed is exactly what version, maybe 3.5…).
path=os.path.join(‘E:\语料’,‘sina.csv’) pd.read_csv(open(path, ‘r’,encoding=‘utf8’))
It is successful.