catboost: catboost/libs/target/target_converter.cpp:64: Unknown class name: "0.6"

Problem: The above exception is thrown for certain target values. catboost version: 0.13.1 Operating System: Linux

How to reproduce:

import catboost as cb
import numpy as np

print(cb.__version__)

model = cb.CatBoostRegressor(
    iterations=1,
    depth=1,
    loss_function='RMSE',
    # If you change the eval metric to RMSE it works
    eval_metric='AUC:border={}'.format(0.5),
    train_dir='/tmp/cbtest2',
)

x = np.array([[1.5], [0.1]])
# If you change the following line to: y = np.array([0.6, 0.4]) it works
y = np.array([0.99, 0.4])
pool = cb.Pool(x, label=y)

x_valid = np.array([[0.33]])
y_valid = np.array([0.6])
pool_valid = cb.Pool(x_valid, label=y_valid)

model.fit(X=pool, eval_set=pool_valid, use_best_model=False)

Full output:

0.13.1

---------------------------------------------------------------------------
CatBoostError                             Traceback (most recent call last)
<ipython-input-81-d2333a747008> in <module>
     21 pool_valid = cb.Pool(x_valid, label=y_valid)
     22 
---> 23 model.fit(X=pool, eval_set=pool_valid, use_best_model=False)

~/.conda/envs/thehft-ml/lib/python3.7/site-packages/catboost/core.py in fit(self, X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval)
   2699                          use_best_model, eval_set, verbose, logging_level, plot, column_description,
   2700                          verbose_eval, metric_period, silent, early_stopping_rounds,
-> 2701                          save_snapshot, snapshot_file, snapshot_interval)
   2702 
   2703     def predict(self, data, ntree_start=0, ntree_end=0, thread_count=-1, verbose=None):

~/.conda/envs/thehft-ml/lib/python3.7/site-packages/catboost/core.py in _fit(self, X, y, cat_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval)
   1171 
   1172         with log_fixup(), plot_wrapper(plot, self.get_params()):
-> 1173             self._train(train_pool, eval_sets, params, allow_clear_pool)
   1174 
   1175         if (not self._object._has_leaf_weights_in_model()) and allow_clear_pool:

~/.conda/envs/thehft-ml/lib/python3.7/site-packages/catboost/core.py in _train(self, train_pool, test_pool, params, allow_clear_pool)
    864 
    865     def _train(self, train_pool, test_pool, params, allow_clear_pool):
--> 866         self._object._train(train_pool, test_pool, params, allow_clear_pool)
    867         self._set_trained_model_attributes()
    868 

_catboost.pyx in _catboost._CatBoost._train()

_catboost.pyx in _catboost._CatBoost._train()

CatBoostError: catboost/libs/target/target_converter.cpp:64: Unknown class name: "0.6"

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 21 (9 by maintainers)

Most upvoted comments

Solved. eval_set contained labels that model have never seen. My y has roughly 1500 categories and cleaning valuecounts for y =1 and stratifying split by y solved this problem. Suggest throwing more detailed exception to prevent posting such errors.

A solution to solve this problem is to define the class_name you can do this using:

catb_model= CatBoostClassifier(iterations=1000,learning_rate=0.05, loss_function='MultiClass', class_names=["1","2","3","4","5","6","7","8","9","10","11"])