h2o4gpu: R package GBM cannot handle sparse matrices (xgboost can)
- OS platform: Linux Ubuntu 16.04
- Installed from (source or binary): Python wheel + R package
- Version: 0.2-nccl-cuda9/h2o4gpu-0.2.0-cp36-cp36m-linux_x86_64.whl
- Python version (optional): 3.6
- CUDA/cuDNN version: 9.0
- GPU model (optional): EC2 p3.2xlarge (Tesla V100)
- CPU model: Intel® Xeon® CPU E5-2686 v4 @ 2.30GHz
- RAM available: 60GB
Train on airline data, 1M records GBM n_estimators = 100L, max_depth = 10L, learning_rate = 0.1
h2o4gpu in R cannot handle sparse matrixes (this makes it slower/using more memory)
Error in resolve_model_input(x) :
Input x of type "dgCMatrix" is not currently supported.
(xgboost in R can)
h2o4gpu (non-sparse) 48sec xgboost non-sparse 26sec xgboost sparse 8sec
(sparse uses Matrix:sparse.model.matrix while non-sparse uses model.matrix for 1-hot encoding)
AUC is the same for all
h2o4gpu code: https://github.com/szilard/GBM-perf/blob/master/wip-testing/h2o4gpu/run.R xgboost code: https://github.com/szilard/GBM-perf/blob/master/gpu/run/2-xgboost.R
data and setup here: https://github.com/szilard/GBM-perf
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 23 (11 by maintainers)
Cool, it runs now much faster. With sparse matrices:
h2o4gpu from R: 15 sec h2o4gpu from python: 12 sec xgboost: 8 sec
My guess is that h2o4gpu R calls h2o4gpu python which in turn calls xgboost, right? It seems there is some overhead with wrapping (probably copying stuff or CPU/GPU transfer issues). Anyway, it’s probably good enough. Maybe I can try to run it also on larger data to see if the overhead is relatively smaller or else.
Update: support for sparse matrices has been added in reticulate. I’ll start working on this issue soon.
@szilard Yes, that should be able to pass the data. If it doesn’t please let us know in another issue and I can investigate. Thx!
@ledell Yes, I meant a user can cast themselves. I’ll update my response to make that clear.