mne-python: ASCII decode error with CNT files

As reported on our mailing list, this CNT file seems to contain a non-ASCII character (î), which leads to a decoding error:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-9-8d42a9688cc0> in <module>
      1 fname = fnames[0]
----> 2 raw = mne.io.read_raw_cnt(fname)

~/anaconda3/lib/python3.7/site-packages/mne/io/cnt/cnt.py in read_raw_cnt(input_fname, eog, misc, ecg, emg, data_format, date_format, preload, verbose)
    163     return RawCNT(input_fname, eog=eog, misc=misc, ecg=ecg,
    164                   emg=emg, data_format=data_format, date_format=date_format,
--> 165                   preload=preload, verbose=verbose)
    166 
    167 

~/anaconda3/lib/python3.7/site-packages/mne/io/cnt/cnt.py in __init__(self, input_fname, eog, misc, ecg, emg, data_format, date_format, preload, verbose)
    389         input_fname = path.abspath(input_fname)
    390         info, cnt_info = _get_cnt_info(input_fname, eog, ecg, emg, misc,
--> 391                                        data_format, _date_format)
    392         last_samps = [cnt_info['n_samples'] - 1]
    393         super(RawCNT, self).__init__(

~/anaconda3/lib/python3.7/site-packages/mne/io/cnt/cnt.py in _get_cnt_info(input_fname, eog, ecg, emg, misc, data_format, date_format)
    179         patient_id = int(patient_id) if patient_id.isdigit() else 0
    180         fid.seek(121)
--> 181         patient_name = read_str(fid, 20).split()
    182         last_name = patient_name[0] if len(patient_name) > 0 else ''
    183         first_name = patient_name[-1] if len(patient_name) > 0 else ''

~/anaconda3/lib/python3.7/site-packages/mne/io/utils.py in read_str(fid, count)
    239                              b'\x00' in data else count]])
    240 
--> 241     return str(bytestr.decode('ascii'))  # Return native str type for Py2/3
    242 
    243 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xee in position 0: ordinal not in range(128)

Should the function read_str default to decoding latin-1 aka 8859 instead?

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 34 (32 by maintainers)

Most upvoted comments

this is not relevant now, because we don’t support ANT CNT files.

Maybe not for this one file. But as @palday points out above:

If the encoding isn’t 8859, then you can decode to the wrong character.

So if we do make an exception for some non-standards-compliant files, what is the justification for picking 8859 in particular as the allowed exception (and risking that files with some other encoding are thus errorful?)

drammock on Aug 25, 2020

But returns utter garbage! That’s clearly not valid text.

I guess that @agramfort just entered arbitrary characters - this sure ain’t French, or is it 🤣 ?

cbrnr on Aug 25, 2020