scikit-learn: Inverse Transform for Label Encoder with mixed strings and numbers returns only strings

Description

With a LabelEncoder fitted with both string and numeric values, the inverse transform of that LabelEncoder will include only strings.

Steps/Code to Reproduce

from sklearn.preprocessing import LabelEncoder
le=LabelEncoder().fit([1, 2, 'a', 'b'])
le.inverse_transform([0, 1, 2, 3])

Expected Results

array([1, 2, 'a', 'b'], dtype=object)

I understand that numpy is not ideal for dealing with non-numeric data, so I don’t know what dtype the output SHOULD be, but I know that if the dtype is simply “object”, then it will differentiate between the strings and numbers.

Actual Results

array('1', '2', 'a', 'b'], dtype='<U11')

The array is no longer mixed type.

Versions

Windows-8.1-6.3.9600-SP0 Python 3.5.3 |Anaconda 4.4.0 (64-bit)| (default, May 15 2017, 10:43:23) [MSC v.1900 64 bit (AMD64)] NumPy 1.12.1 SciPy 0.19.0 Scikit-Learn 0.18.1

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Comments: 23 (16 by maintainers)

Most upvoted comments

Informative warnings are usually very welcome!