h5py: Encoding error on Windows I/O with Python 3.6

I use H5py 2.7rc2 with Python 3.6.0 64bits on Windows 7 64bits.

In this example, I try to write and read two files named test_e.h5 and test_é.h5 :

Theses problems are new (Only since I updated to Python 3.6).

Maybe linked to PEP529 change.

Write test :

f = h5py.File('test_e.h5', 'w')
f.close()
f = h5py.File('test_é.h5', 'w')
f.close()
print('Directory content after test:', os.listdir())

Result :

Directory content after test:  ['test_e.h5', 'test_é.h5']

Bad file name: test_é.h5 in place of test_é.h5

Read Test

print('Directory content before test: ', os.listdir())
f = h5py.File('test_e.h5', 'r')
f.close()
f = h5py.File('test_é.h5', 'r')
f.close()

Result :

Directory content before test:  ['test_e.h5', 'test_é.h5']

Traceback (most recent call last):

  File "D:/Dev/format_Hdf5.py", line 16, in <module>
    f = h5py.File('test_é.h5', 'r')

  File "d:\app\python36\lib\site-packages\h5py\_hl\files.py", line 271, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)

  File "d:\app\python36\lib\site-packages\h5py\_hl\files.py", line 101, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)

  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (D:\Build\h5py\h5py-2.7.0rc2\h5py\_objects.c:2853)

  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (D:\Build\h5py\h5py-2.7.0rc2\h5py\_objects.c:2811)

  File "h5py\h5f.pyx", line 78, in h5py.h5f.open (D:\Build\h5py\h5py-2.7.0rc2\h5py\h5f.c:2130)

OSError: Unable to open file (Unable to open file: name = 'test_é.h5', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0)

Existing file test_é.h5 not found.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 23 (11 by maintainers)

Most upvoted comments

Just to let you know that HDF5 1.10.6 is supporting UTF-8 filenames in Windows!

I have been in touch with HDF support and they assured me that they will start using UTF-8 filenames on Windows starting with 1.10.3, which is due for release at the end of this year.

That should settle this issue once and for all…

@aragilar using filename.encode('mbcs') work well as temporary fix for my code. Thanks.

Strangely, using os.fsencode(filename) even with filesystem forced to mbcs don’t work :

import sys
import os

# Force the use of 'mbcs' like versions of Python prior to 3.6.
sys._enablelegacywindowsfsencoding()

# Show actual file system encoding
encoding = sys.getfilesystemencoding()
print('Filesystem encoding:', encoding)

# os.fsencode(filename) VS filename.encode(File system encoding)
print(os.fsencode('test_é.h5'), 'test_é.h5'.encode(encoding))

Result :

Filesystem encoding: mbcs
b'test_\xc3\xa9.h5' b'test_\xe9.h5'

Filesystem encoding is mbcs. But, encoded bytes are different (The first is in utf-8).