river: Error when using OutputCodeClassifier for Code-size greater than 20

Hello again,

I was testing the OCC classifier with more than 90 classes and the accuracy is very poor. I assume I need a huge code size, however I was testing different code-sizes (staring with a code-size of 10) and recording the accuracy when I came to a code-size of 40 and received the following error: OverflowError: Python int too large to convert to C ssize_t. Is there a way we can modify the occ classifier to allow for compact codes (as short as possible) while still providing enough discriminating power between the different classes.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 23 (23 by maintainers)

Most upvoted comments

I will look into this sometime this week 😃

The OVR crashes. Unfortunately, the dataset is confidential, but a similar dataset would be the consumer complaints benchmark published by the Consumer Financial Protection Bureau. https://www.consumerfinance.gov/data-research/consumer-complaints/ Since it’s a hierarchical problem, I’m flattening the classes to be able to use River. For the Consumer Complaints, you could use the Product and Sub-product columns.