treelite: Prediction Results Do Not Match Original XGBoost

I trained a model with a single iteration:

param = {
    'nthread': 10,
    'objective': 'multi:softprob',
    'eval_metric': 'mlogloss',
    'num_class': 3,
    'silent': 1,
    'max_depth': 5,
    'min_child_weight': 5,
    'eta': 0.5,  # learning rate
    'subsample': 1,
    'colsample_bytree': 1,
    'gamma': 0, 
    'alpha': 0,
    'lambda': 1, 
}
watchlist  = [(dtest, 'eval')]
bst = xgb.train(param, dtrain, 1, watchlist)

sample will be a numpy array with a single row of features. The output of:

bst.predict(xgb.DMatrix(sample))

differs noticeably from the output of:

bst_lite = treelite.Model.from_xgboost(bst)
bst_lite.export_lib(toolchain='gcc', libpath=model_path, verbose=True)
batch = Batch.from_npy2d(sample)
predictor = Predictor(model_path, verbose=True)
predictor.predict(batch)

The problem is compounded when more trees are added.

Any ideas of what could be going on here?

Thanks for your work on this @hcho3 .

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 20

Commits related to this issue

Fix bug in predictor: out-of-bound memory access Responding to issue #7. What was the problem? * Some calling parameters for TreeliteAssembleSparseBatch() and TreeliteAssembleDenseBatch() were inc... — committed to dmlc/treelite by hcho3 7 years ago
Take account of global model bias Responding to issue #7. Problem: margin predictions were off by 0.5 Diagnosis: XGBoost adds a global bias of 0.5 when training the model Fix: Save the global bias... — committed to dmlc/treelite by hcho3 7 years ago

Most upvoted comments

@alex-r-xiao Let me give you an example: the following is one possible set of margin scores produced by a 2-iteration ensemble. We assume num_class=3. (The numbers are all made up, for the sake of illustration.)

Tree 0 produces  +0.5
Tree 1 produces  +1.5
Tree 2 produces  -2.3
Tree 3 produces  -1.5
Tree 4 produces  +0.1
Tree 5 produces  +0.7

Even though we trained for 2 iterations, we have a total of 6 decision trees. This is because, for multi-class classification, XGBoost will produce [number of iterations] * [number of classes] trees. Let’s walk through to see how these margin scores get transformed into the final prediction.

Group the trees into output groups. There should be as many output groups as there are label classes. For this example, there are 3 label classes, so there are 3 output groups. The member trees are grouped as follows:

Output group 0:  Tree 0, Tree 3
Output group 1:  Tree 1, Tree 4
Output group 2:  Tree 2, Tree 5

Compute the sum of margin scores for each output group.

Output group 0:  (+0.5) + (-1.5) = -1.0
Output group 1:  (+1.5) + (+0.1) = +1.6
Output group 2:  (-2.3) + (+0.7) = -1.6

The vector [-1.0, +1.6, -1.6] is not a proper probability distribution, as the numbers don’t add up to 1. However, even this is quite useful: the sum for output group 1 is the largest, so the second class is the most probable class.

Transform the sums using the softmax function. To apply what is known as softmax, first apply the exponential function (exp) to every element in the vector. So we start with

[-1.0, +1.6, -1.6]

and compute

np.exp([-1.0, +1.6, -1.6])

which gives us

[ 0.36787944,  4.95303242,  0.20189652]

Then normalize the vector by dividing it by its element sum:

np.exp([-1.0, +1.6, -1.6]) / np.sum(np.exp([-1.0, +1.6, -1.6]))

giving a proper probability distribution:

[ 0.06661094,  0.89683221,  0.03655686]

We interpret this vector as showing 6.7% probability for the first class, 89.7% for the second class, and 3.7% for the third class.

hcho3 on Oct 17, 2017

Indeed, the margins were off by 0.5. This is because XGBoost adds a global bias of 0.5 by default. I’ve pushed a fix. Now the global bias will be added to every prediction, reproducing the behavior of XGBoost. The updated binaries (version 0.1a8) will be available in a few hours.

hcho3 on Oct 20, 2017

@alex-r-xiao Gradient boosted trees produce arbitrary real-numbered outputs, which we refer to as “margin scores.” The margin scores then gets transformed into a proper probability distribution (all probabilities adding up to 1) by the softmax function.

hcho3 on Oct 17, 2017

@alex-r-xiao The bug fix has been uploaded. Be sure to install the latest binary release (0.1a5) from PyPI.

hcho3 on Oct 13, 2017