mne-python: ASCII decode error with CNT files
As reported on our mailing list, this CNT file seems to contain a non-ASCII character (î), which leads to a decoding error:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-9-8d42a9688cc0> in <module>
1 fname = fnames[0]
----> 2 raw = mne.io.read_raw_cnt(fname)
~/anaconda3/lib/python3.7/site-packages/mne/io/cnt/cnt.py in read_raw_cnt(input_fname, eog, misc, ecg, emg, data_format, date_format, preload, verbose)
163 return RawCNT(input_fname, eog=eog, misc=misc, ecg=ecg,
164 emg=emg, data_format=data_format, date_format=date_format,
--> 165 preload=preload, verbose=verbose)
166
167
~/anaconda3/lib/python3.7/site-packages/mne/io/cnt/cnt.py in __init__(self, input_fname, eog, misc, ecg, emg, data_format, date_format, preload, verbose)
389 input_fname = path.abspath(input_fname)
390 info, cnt_info = _get_cnt_info(input_fname, eog, ecg, emg, misc,
--> 391 data_format, _date_format)
392 last_samps = [cnt_info['n_samples'] - 1]
393 super(RawCNT, self).__init__(
~/anaconda3/lib/python3.7/site-packages/mne/io/cnt/cnt.py in _get_cnt_info(input_fname, eog, ecg, emg, misc, data_format, date_format)
179 patient_id = int(patient_id) if patient_id.isdigit() else 0
180 fid.seek(121)
--> 181 patient_name = read_str(fid, 20).split()
182 last_name = patient_name[0] if len(patient_name) > 0 else ''
183 first_name = patient_name[-1] if len(patient_name) > 0 else ''
~/anaconda3/lib/python3.7/site-packages/mne/io/utils.py in read_str(fid, count)
239 b'\x00' in data else count]])
240
--> 241 return str(bytestr.decode('ascii')) # Return native str type for Py2/3
242
243
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee in position 0: ordinal not in range(128)
Should the function read_str default to decoding latin-1 aka 8859 instead?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 34 (32 by maintainers)
Maybe not for this one file. But as @palday points out above:
So if we do make an exception for some non-standards-compliant files, what is the justification for picking 8859 in particular as the allowed exception (and risking that files with some other encoding are thus errorful?)
I guess that @agramfort just entered arbitrary characters - this sure ain’t French, or is it 🤣 ?