scikit-learn: DecisionTreeClassifier unknown label type: 'continuous-multioutput'

Description

DecisionTreeClassifier crashes with unknown label type: 'continuous-multioutput'. I’ve tried loading csv file using csv.reader, pandas.read_csv and some other stuff like parsing line-by-line.

Steps/Code to Reproduce

from sklearn import tree
feature_df = pd.read_csv(os.path.join(_PATH, 'features.txt'))
target_df = pd.read_csv(os.path.join(_PATH, 'target.txt'))
feature_df = feature_df._get_numeric_data()
target_df = target_df._get_numeric_data()
feature_df = feature_df.fillna(0)
target_df = target_df.fillna(0)
clf = tree.DecisionTreeClassifier()
clf_o = clf.fit(feature_df, target_df)

features.txt target.txt

Expected Results

Error thrown informs user what REALLY is wrong, that f.e. his data set does not folllow assumptions (and what are those)

Actual Results

Traceback (most recent call last):
  File "D:\Piotr\Documents\uni\bap\BAPFingerprintLocalisation\main.py", line 19,
 in <module>
    decision_tree.treeClassification()
  File "D:\Piotr\Documents\uni\bap\BAPFingerprintLocalisation\code\decision_tree
.py", line 56, in treeClassification
    clf_o = clf.fit(feature_df, target_df)
  File "C:\Python35\lib\site-packages\sklearn\tree\tree.py", line 182, in fit
    check_classification_targets(y)
  File "C:\Python35\lib\site-packages\sklearn\utils\multiclass.py", line 172, in
 check_classification_targets
    raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous-multioutput'

Versions

Windows-10-10.0.14393-SP0
Python 3.5.1 (v3.5.1:37a07cee5969, Dec  6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.18

Update:

I’ve changed number of target variables to one, just to simplify things

clf_o = clf.fit(feature_df, target_df.ix[:,1])

Output: Unknown label type: 'continuous'

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 2
Comments: 20 (9 by maintainers)

Most upvoted comments

Or use a DecisionTreeRegressor

+17

jnothman on Nov 1, 2016

You should be using DecisionTreeRegressor

+16

jnothman on Oct 31, 2016

Classification targets should be represented as integers or as strings. You can ask Pandas to read the target data in as a string and you’ll be fine.

jnothman on Nov 1, 2016

I’m having this same issue, is there a fix for it?

alexrindone on Sep 8, 2019

If we put as imput training_data_X, training_scores_Y to fit method it cause error. To avoid it we will convert and encode labels

from sklearn import preprocessing from sklearn import utils lab_enc = preprocessing.LabelEncoder() y_train = lab_enc.fit_transform(y_train) print(y_train) print(utils.multiclass.type_of_target(y_train)) print(utils.multiclass.type_of_target(y_train.astype(‘int’))) print(utils.multiclass.type_of_target(y_train))

rajeshjnv on Jun 3, 2019

For the error message would “Unsupported output type: ‘continuous-multioutput’” be better? That is the real issue. Also see #7809 for the docstring.

amueller on Nov 1, 2016

You’re right that the error message could be more useful, but the documentation for fit does say “class labels in classification”. Feel free to submit a clearer issue about needing to document the expected data type for classification ys, and another for raising appropriate error messages when float data is passed as y to a classifier.

jnothman on Nov 1, 2016

@alexrindone @satish-bot you may refer to https://stackoverflow.com/questions/56380097/cant-understand-this-error-unknown-label-type-continuous-multioutput/66890653#66890653. It answers one of the solution to this problem. Check if it helps.

vaitybharati on Mar 31, 2021

That’s better. But still I don’t understand why you won’t name it as it is. Because literature mostly calls that ‘Target’ variables, and output could be mistaken with function output. Exception was thrown from function ‘check_classification_targets’, so even you say that’s ‘target’ variable, and still you want to call it ‘label’ or ‘output’. I’m not a member of scikit-learn member, so you will do as you please, but I would recommend to use words ‘Target variable’ in doscstring and error message. And I ask you to describe anywhere rules that input data (or target) should follow. A short sentence - ‘Target variable (parameter y) has to be int or str’.

KamodaP on Nov 2, 2016