xgboost: Poor out-of-sample accuracy with `reg:absoluteerror`

(Original title: “Poor fit with absolute-error-like objective functions”)

I’ve looked more into the problem I described in this discussion post, and I think there are two underlying problems, one easy to solve or work around, and one not so easy.

The easy part is that min_child_weight (at least at its default value, 1) seems to be pretty destructive to the accuracy of some objectives, such as pseudo-Huber. In the below example (copied from my earlier post), I get a MAE for pseudo-Huber of over 33 with the default min_child_weight = 1, but 0.9 with min_child_weight = 0. I think the default should probably be 0. Increasing min_child_weight probably isn’t the first thing you’d want to try if your model came out too complex, anyway.

library(xgboost)

rmse = function(x, y)
    sqrt(mean((x - y)^2))
mae = function(x, y)
    mean(abs(x - y))

set.seed(5)

N = 1000
x = rep(c(0L, 1L, 10L), len = N)
y = x^2 + rnorm(N)^2

for (loss in c("reg:squarederror", "reg:pseudohubererror"))
    {m = xgboost::xgboost(
         verbose = 0,
         params = list(objective = loss),
         data = matrix(x),
         label = y,
         nrounds = 50)
    p = predict(m, newdata = matrix(x))
    message("RMSE ", loss, " - ", rmse(y, p))
    message("MAE ", loss, " - ", mae(y, p))}

This isn’t the whole story, though; because first, a log-cosh objective, defined by

logcosh.objective = function(preds, dtrain)
   {e = preds - getinfo(dtrain, "label")
    list(grad = tanh(e), hess = 1/cosh(e)^2)}

still does quite badly in this case with min_child_weight = 0, and second, pseudo-Huber loss does better but still not good enough in the case of the trivial example:

x = c(0L, 1L)
y = x

for (loss in c("reg:squarederror", "reg:pseudohubererror"))
    {m = xgboost::xgboost(
         verbose = 0,
         params = list(
             lambda = 0, eta = 1,
             objective = loss),
         data = matrix(x),
         label = y,
         nrounds = 1)
    p = predict(m, newdata = matrix(x))
    message("Predictions for ", loss, ": ")
    print(p)}

Here, if I set min_child_weight = 0, the predictions for pseudo-Huber become -.0125 and 1.125, which are closer to but not equal to the expected answers, 0 and 1. Perhaps tellingly, if base_score is increased to 10, pseudo-Huber produces absurd predictions of [-999.9999, -728.0000], whereas the squared-error case still does the right thing.

I think the problem is not with how splits are chosen in the predictor space, but how values are assigned to the leaves. In the function CalcWeight in src/tree/param.h, the leaf values come out to (in the unregularized case) the gradient divided by the Hessian. I don’t really understand the motivation for this, despite consulting the paper, and it seems to work poorly in the case of pseudo-Huber, at least.

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 2
Comments: 25 (21 by maintainers)

Most upvoted comments

Thanks, it works for me with master. Great work. The project that originally motivated me to make this issue is too advanced now for me to try using MAE again, and the example works now, so I think we’re good. If I later try this with a real project, and I think there’s still a problem with XGBoost, I’ll open a new issue.

Kodiologist on Jul 7, 2023