h2o4gpu: R package GBM cannot handle sparse matrices (xgboost can)

OS platform: Linux Ubuntu 16.04
Installed from (source or binary): Python wheel + R package
Version: 0.2-nccl-cuda9/h2o4gpu-0.2.0-cp36-cp36m-linux_x86_64.whl
Python version (optional): 3.6
CUDA/cuDNN version: 9.0
GPU model (optional): EC2 p3.2xlarge (Tesla V100)
CPU model: Intel® Xeon® CPU E5-2686 v4 @ 2.30GHz
RAM available: 60GB

Train on airline data, 1M records GBM n_estimators = 100L, max_depth = 10L, learning_rate = 0.1

h2o4gpu in R cannot handle sparse matrixes (this makes it slower/using more memory)

Error in resolve_model_input(x) :
  Input x of type "dgCMatrix" is not currently supported.

(xgboost in R can)

h2o4gpu (non-sparse) 48sec xgboost non-sparse 26sec xgboost sparse 8sec

(sparse uses Matrix:sparse.model.matrix while non-sparse uses model.matrix for 1-hot encoding)

AUC is the same for all

h2o4gpu code: https://github.com/szilard/GBM-perf/blob/master/wip-testing/h2o4gpu/run.R xgboost code: https://github.com/szilard/GBM-perf/blob/master/gpu/run/2-xgboost.R

data and setup here: https://github.com/szilard/GBM-perf

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 23 (11 by maintainers)

Most upvoted comments

Cool, it runs now much faster. With sparse matrices:

h2o4gpu from R: 15 sec h2o4gpu from python: 12 sec xgboost: 8 sec

My guess is that h2o4gpu R calls h2o4gpu python which in turn calls xgboost, right? It seems there is some overhead with wrapping (probably copying stuff or CPU/GPU transfer issues). Anyway, it’s probably good enough. Maybe I can try to run it also on larger data to see if the overhead is relatively smaller or else.

szilard on Apr 11, 2018

Update: support for sparse matrices has been added in reticulate. I’ll start working on this issue soon.

terrytangyuan on Apr 2, 2018

@szilard Yes, that should be able to pass the data. If it doesn’t please let us know in another issue and I can investigate. Thx!

navdeep-G on Mar 28, 2018

@ledell Yes, I meant a user can cast themselves. I’ll update my response to make that clear.

navdeep-G on Mar 28, 2018