river: Adaptive Random Forest Regressor/Hoeffding Tree Regressor splitting gives AttributeError
Versions
river version: River 0.1.0 Python version:Python 3.7.1 Operating system: Windows 10 Enterprise v1803
Describe the issue
Have been playing around with River/Creme for a couple of weeks now, and it’s super useful for a project I am currently working on. That being said, I’m still pretty new to the workings of the algorithms, so unsure whether this is a bug or an issue with my setup.
When I call learn_one on the ARFR or HTR, I receive: “Attribute Error “NoneType” object has no attribute ‘_left’” from line 113 in best_evaluated_split_suggestion.py.
I have implemented a delayed prequential evaluation algorithm, and inspecting the loop, the error seems to be thrown when the model after the first delay period has been exceeded - ie when the model can first make a prediction that isn’t zero. Before this point, learn_one doesn’t throw this error.
Currently, I am using the default ARFR as in the example, with a simple Linear Regression as the leaf_model. The linear regression model itself has been working with my data, when not used in the ARFR. I want to try the ensemble/tree models with the data to see if accuracy is improved due to the drift detection methods that are included.
Has anyone also seen this error being thrown, or know the causes of it? Let me know if more information is needed! Thanks.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 23 (15 by maintainers)
Hi @JPalm1, this question is out of the scope of the decision tree module. Could you please open another issue to directly tackle this problem?
Meanwhile, @MaxHalford do you have any idea of what might be happening?
Well, that was quite some reading. Thanks to all involved in the interesting discussion.
Missing data is unavoidable in real-world applications. My comment is in two main aspects:
regarding this particular case. When the data dictionaries are created some features are skipped, which results in missing data as the tree has “seen” those features in the past, in other words, they are not “emerging features”. In this case, the missing data is “generated” and it means something (a given condition was met). My suggestion is that the data should be properly set to some value to indicate that the condition was met.
In a general sense, missing data imputation is a relevant topic that has been partially addressed in streaming data. Options 1 and 2 above seems reasonable. For simplicity, option 1 does the job and provides protection against missing data, this is the strategy used by XGBoost for example. In practice is like replacing the missing data with a constant value. We can argue that this is not the most elegant imputation method but it does the job. On the other hand, option 2 implies an overhead when processing complex trees and very sparse data.
The HATR is now running smoothly for me with no errors - thanks for taking the time to look into this @smastelini !
It’s not giving me the best predictions, however, but that’s on me to investigate… 😃
You can check
compose.SelectTypefor that! 😄Please, also check the examples section of iSOUPTreeRegressor to see that in action.
My bad! I missed one additional place to protect the trees. The fix is in place.
Anyhow, my point is that the error appears because only valid labels should be passed to the learning models. I am assuming that
Noneis being passed as target to the trees in your custom implementation. Am I wrong?Hey, @smastelini - thanks for looking into it!
That explanation makes sense. Just to note: I’m not set up to use
evaluate.progressive_val_score- I haven’t set everything up in a pipeline yet, I am just doing the ‘standard’ approach of iterating through my data points, predicting for each and then evaluating if the delay time has been exceeded. Therefore I’m callinglearn_one(x, y)on the ARFR model every time there is an item to be popped out of the delayed evaluation queue.The changes you have made in #394 seem to have solved the initial exception but pushed the exception to a different location: this time line 223 in
numeric_attribute_regression_observer.pyLet me know if you need any more info - I’m having a dig around too to try further my own understanding as well. 😃
@smastelini I guess this is for you 😃