scikit-learn: DecisionTreeClassifier unknown label type: 'continuous-multioutput'
Description
DecisionTreeClassifier crashes with unknown label type: 'continuous-multioutput'
. I’ve tried loading csv file using csv.reader, pandas.read_csv and some other stuff like parsing line-by-line.
Steps/Code to Reproduce
from sklearn import tree
feature_df = pd.read_csv(os.path.join(_PATH, 'features.txt'))
target_df = pd.read_csv(os.path.join(_PATH, 'target.txt'))
feature_df = feature_df._get_numeric_data()
target_df = target_df._get_numeric_data()
feature_df = feature_df.fillna(0)
target_df = target_df.fillna(0)
clf = tree.DecisionTreeClassifier()
clf_o = clf.fit(feature_df, target_df)
Expected Results
Error thrown informs user what REALLY is wrong, that f.e. his data set does not folllow assumptions (and what are those)
Actual Results
Traceback (most recent call last):
File "D:\Piotr\Documents\uni\bap\BAPFingerprintLocalisation\main.py", line 19,
in <module>
decision_tree.treeClassification()
File "D:\Piotr\Documents\uni\bap\BAPFingerprintLocalisation\code\decision_tree
.py", line 56, in treeClassification
clf_o = clf.fit(feature_df, target_df)
File "C:\Python35\lib\site-packages\sklearn\tree\tree.py", line 182, in fit
check_classification_targets(y)
File "C:\Python35\lib\site-packages\sklearn\utils\multiclass.py", line 172, in
check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous-multioutput'
Versions
Windows-10-10.0.14393-SP0
Python 3.5.1 (v3.5.1:37a07cee5969, Dec 6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.18
Update:
I’ve changed number of target variables to one, just to simplify things
clf_o = clf.fit(feature_df, target_df.ix[:,1])
Output: Unknown label type: 'continuous'
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 2
- Comments: 20 (9 by maintainers)
Or use a
DecisionTreeRegressor
You should be using
DecisionTreeRegressor
Classification targets should be represented as integers or as strings. You can ask Pandas to read the target data in as a string and you’ll be fine.
I’m having this same issue, is there a fix for it?
If we put as imput training_data_X, training_scores_Y to fit method it cause error. To avoid it we will convert and encode labels
from sklearn import preprocessing from sklearn import utils lab_enc = preprocessing.LabelEncoder() y_train = lab_enc.fit_transform(y_train) print(y_train) print(utils.multiclass.type_of_target(y_train)) print(utils.multiclass.type_of_target(y_train.astype(‘int’))) print(utils.multiclass.type_of_target(y_train))
For the error message would “Unsupported output type: ‘continuous-multioutput’” be better? That is the real issue. Also see #7809 for the docstring.
You’re right that the error message could be more useful, but the documentation for
fit
does say “class labels in classification”. Feel free to submit a clearer issue about needing to document the expected data type for classificationy
s, and another for raising appropriate error messages when float data is passed asy
to a classifier.@alexrindone @satish-bot you may refer to https://stackoverflow.com/questions/56380097/cant-understand-this-error-unknown-label-type-continuous-multioutput/66890653#66890653. It answers one of the solution to this problem. Check if it helps.
That’s better. But still I don’t understand why you won’t name it as it is. Because literature mostly calls that ‘Target’ variables, and output could be mistaken with function output. Exception was thrown from function ‘check_classification_targets’, so even you say that’s ‘target’ variable, and still you want to call it ‘label’ or ‘output’. I’m not a member of scikit-learn member, so you will do as you please, but I would recommend to use words ‘Target variable’ in doscstring and error message. And I ask you to describe anywhere rules that input data (or target) should follow. A short sentence - ‘Target variable (parameter y) has to be int or str’.