scikit-learn: DecisionTreeClassifier unknown label type: 'continuous-multioutput'

Description

DecisionTreeClassifier crashes with unknown label type: 'continuous-multioutput'. I’ve tried loading csv file using csv.reader, pandas.read_csv and some other stuff like parsing line-by-line.

Steps/Code to Reproduce

from sklearn import tree
feature_df = pd.read_csv(os.path.join(_PATH, 'features.txt'))
target_df = pd.read_csv(os.path.join(_PATH, 'target.txt'))
feature_df = feature_df._get_numeric_data()
target_df = target_df._get_numeric_data()
feature_df = feature_df.fillna(0)
target_df = target_df.fillna(0)
clf = tree.DecisionTreeClassifier()
clf_o = clf.fit(feature_df, target_df)

features.txt target.txt

Expected Results

Error thrown informs user what REALLY is wrong, that f.e. his data set does not folllow assumptions (and what are those)

Actual Results

Traceback (most recent call last):
  File "D:\Piotr\Documents\uni\bap\BAPFingerprintLocalisation\main.py", line 19,
 in <module>
    decision_tree.treeClassification()
  File "D:\Piotr\Documents\uni\bap\BAPFingerprintLocalisation\code\decision_tree
.py", line 56, in treeClassification
    clf_o = clf.fit(feature_df, target_df)
  File "C:\Python35\lib\site-packages\sklearn\tree\tree.py", line 182, in fit
    check_classification_targets(y)
  File "C:\Python35\lib\site-packages\sklearn\utils\multiclass.py", line 172, in
 check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous-multioutput'

Versions

Windows-10-10.0.14393-SP0
Python 3.5.1 (v3.5.1:37a07cee5969, Dec  6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.18

Update:

I’ve changed number of target variables to one, just to simplify things

clf_o = clf.fit(feature_df, target_df.ix[:,1])

Output: Unknown label type: 'continuous'

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 2
  • Comments: 20 (9 by maintainers)

Most upvoted comments

Or use a DecisionTreeRegressor

You should be using DecisionTreeRegressor

Classification targets should be represented as integers or as strings. You can ask Pandas to read the target data in as a string and you’ll be fine.

I’m having this same issue, is there a fix for it?

If we put as imput training_data_X, training_scores_Y to fit method it cause error. To avoid it we will convert and encode labels

from sklearn import preprocessing from sklearn import utils lab_enc = preprocessing.LabelEncoder() y_train = lab_enc.fit_transform(y_train) print(y_train) print(utils.multiclass.type_of_target(y_train)) print(utils.multiclass.type_of_target(y_train.astype(‘int’))) print(utils.multiclass.type_of_target(y_train))

For the error message would “Unsupported output type: ‘continuous-multioutput’” be better? That is the real issue. Also see #7809 for the docstring.

You’re right that the error message could be more useful, but the documentation for fit does say “class labels in classification”. Feel free to submit a clearer issue about needing to document the expected data type for classification ys, and another for raising appropriate error messages when float data is passed as y to a classifier.

That’s better. But still I don’t understand why you won’t name it as it is. Because literature mostly calls that ‘Target’ variables, and output could be mistaken with function output. Exception was thrown from function ‘check_classification_targets’, so even you say that’s ‘target’ variable, and still you want to call it ‘label’ or ‘output’. I’m not a member of scikit-learn member, so you will do as you please, but I would recommend to use words ‘Target variable’ in doscstring and error message. And I ask you to describe anywhere rules that input data (or target) should follow. A short sentence - ‘Target variable (parameter y) has to be int or str’.