LightGBM: lightgbm.basic.LightGBMError: Bug in GPU histogram! split 11937: 12, smaller_leaf: 10245, larger_leaf: 1704
version: 2.3.2
[LightGBM] [Fatal] Bug in GPU histogram! split 11937: 12, smaller_leaf: 10245, larger_leaf: 1704
Traceback (most recent call last):
File "lgb_prefit_4ff5fa97-86b3-420c-aa87-5f01abcc18c3.py", line 10, in <module>
model.fit(X, y, sample_weight=sample_weight, init_score=init_score, eval_set=eval_set, eval_names=valid_X_features, eval_sample_weight=eval_sample_weight, eval_init_score=init_score, eval_metric=eval_metric, early_stopping_rounds=early_stopping_rounds, feature_name=X_features, verbose=verbose_fit)
File "/home/jon/.pyenv/versions/3.6.7/lib/python3.6/site-packages/lightgbm_gpu/sklearn.py", line 818, in fit
callbacks=callbacks, init_model=init_model)
File "/home/jon/.pyenv/versions/3.6.7/lib/python3.6/site-packages/lightgbm_gpu/sklearn.py", line 610, in fit
callbacks=callbacks, init_model=init_model)
File "/home/jon/.pyenv/versions/3.6.7/lib/python3.6/site-packages/lightgbm_gpu/engine.py", line 250, in train
booster.update(fobj=fobj)
File "/home/jon/.pyenv/versions/3.6.7/lib/python3.6/site-packages/lightgbm_gpu/basic.py", line 2106, in update
ctypes.byref(is_finished)))
File "/home/jon/.pyenv/versions/3.6.7/lib/python3.6/site-packages/lightgbm_gpu/basic.py", line 46, in _safe_call
raise LightGBMError(decode_string(_LIB.LGBM_GetLastError()))
lightgbm.basic.LightGBMError: Bug in GPU histogram! split 11937: 12, smaller_leaf: 10245, larger_leaf: 1704
script and pickle file:
@sh1ng need help seeing if this is fixed in even later master.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 39 (10 by maintainers)
@guolinke Have just built it from the latest master branch, still fails. I’ll try to separate a minimum reproducible example and create an issue then.
Still happens in version 3.0
https://github.com/h2oai/h2o4gpu/blob/master/tests/python/open_data/gbm/test_lightgbm.py#L265-L284
@nightflight-dk Thanks for having a trial. Since #4528 is a very large PR, we plan to decompose it into several parts, and merge them one by one. We expect to finish the merge process by the end of this month. Multi-GPU and distributed training will be added after #4528 is being merged. I will point that out once PRs are open for that.
+1, this bug makes lightgbm GPU useless. still happens to me on latest master
Since there hasn’t been any activity for a year, I would like to bring this topic up again.
Got the version 3.3.3, python. Training on GPU, on Windows.
The issue is bugging me for the past 2 days… The data set is 500k, with 1500 features. There seems to be some correlation with
min_gain_to_splitparameter. When the value is 1 I have not yet seen any errors, however on value 0 (default) it seems to crash quite often. Take this comment with caution since I have not ran enough tests yet…crashed when
the code is:
with a few changes here and there
exception is:
I am using
optunafor optimization so the set of parameters is always different.Tried using different split ratio (0.19/0.20/0.21) - does not seem to fix anything
as well as tried experimenting with the amount of data (600_000/600_001/200_001). Nothing seems to help fix the issue… Can this fix be expected in the next major release? I see that the topic is still active…
Thank you @nightflight-dk , actually we had re-written the LightGBM GPU version, and previous OpenCL and CUDA versions will be deprecated. refer to PR https://github.com/microsoft/LightGBM/pull/4528
Hi, I’m using the GPU setting and have the same issue. I tried “deterministic = True” but it did not solve the problem. I saw that the LightGBM v3.2.0 may fix this defect. I have a few question as follows:
I apologize if my questions are a bit out of bound. Best regards