xgboost: model can't be reproduced when data is big with tree_method = "gpu_hist" ?

I did some tests in R(xgboost-0.81.0.1), when data N is big, I found the models trained with the same parameter are not the same on GPU (tree_method = "gpu_hist"), when data N is relatively small, the models are the same. But when I use tree_method = "hist" to train the model repeatedly on cpu, all the models result are the same. I don’t know what happened on GPU training, due to the precision?

GPU test code, big data:

library(xgboost)
# Simulate N x p random matrix with some binomial response dependent on pp columns
set.seed(111)
N <- 800000
p <- 100
X <- matrix(runif(N * p), ncol = p)
beta <- runif(p)
y <- X %*% beta + rnorm(N, mean = 0, sd  = 0.1)

tr <- sample.int(N, N * 0.75)

param <- list(nrounds = 10, num_parallel_tree = 1, nthread = 1L, eta = 0.3, max_depth = 30,
  seed = 2018, colsample_bytree = 0.4, subsample = 0.6,  min_child_weight = 1000,
  tree_method = 'gpu_hist', grow_policy = "lossguide", max_leaves = 1e4,  max_bin = 256,
  n_gpus = 1, gpu_id = 3, verbose = FALSE)
param$data <- X[tr,]
param$label <- y[tr]

set.seed(2019)
bst_gpu1 <- do.call(xgboost::xgboost, param)
test_pred1 <- predict(bst_gpu1, newdata = X)

set.seed(2019)
bst_gpu2 <- do.call(xgboost::xgboost, param)
test_pred2 <- predict(bst_gpu2, newdata = X)

set.seed(2019)
bst_gpu3 <- do.call(xgboost::xgboost, param)
test_pred3 <- predict(bst_gpu3, newdata = X)

set.seed(2019)
bst_gpu4 <- do.call(xgboost::xgboost, param)
test_pred4 <- predict(bst_gpu4, newdata = X)

set.seed(2019)
bst_gpu5 <- do.call(xgboost::xgboost, param)
test_pred5 <- predict(bst_gpu5, newdata = X)

all_pred <- cbind(test_pred1, test_pred2, test_pred3, test_pred4, test_pred5)
head(all_pred)
#       test_pred1 test_pred2 test_pred3 test_pred4 test_pred5
# [1,]   22.43434   22.65794   22.46917   22.60526   22.43433
# [2,]   24.28225   24.42978   24.34619   24.60111   24.28225
# [3,]   23.11788   23.15692   23.07406   23.22111   23.11788
# [4,]   23.74367   23.92602   24.26277   24.11207   23.74367
# [5,]   22.97502   23.24378   23.25752   22.92594   22.97502
# [6,]   23.34638   23.52209   23.47491   23.71274   23.34638

summary(test_pred1 - test_pred2)
#      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
# -1.2855778 -0.1688867 -0.0002308 -0.0002195  0.1685147  1.3110085
summary(test_pred1 - test_pred3)
#      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
# -1.3292294 -0.1703205  0.0000973 -0.0001312  0.1701469  1.3229237 

the difference is big, but change N to 80000 or replace tree_method = "gpu_hist" to tree_method = "hist", the results are the same.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

Using single_precision_histogram = F will give you the reproducible results.