river: Error when using OutputCodeClassifier for Code-size greater than 20
Hello again,
I was testing the OCC classifier with more than 90 classes and the accuracy is very poor. I assume I need a huge code size, however I was testing different code-sizes (staring with a code-size of 10) and recording the accuracy when I came to a code-size of 40 and received the following error:
OverflowError: Python int too large to convert to C ssize_t. Is there a way we can modify the occ classifier to allow for compact codes (as short as possible) while still providing enough discriminating power between the different classes.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (23 by maintainers)
I will look into this sometime this week 😃
The OVR crashes. Unfortunately, the dataset is confidential, but a similar dataset would be the consumer complaints benchmark published by the Consumer Financial Protection Bureau. https://www.consumerfinance.gov/data-research/consumer-complaints/ Since it’s a hierarchical problem, I’m flattening the classes to be able to use River. For the Consumer Complaints, you could use the Product and Sub-product columns.