xgboost: xgboost 1.1.1 pred failed, while 0.90 pred success

1line_inst: 0 999:2000.000000

#model1.bin train with xgb0.90 #model2.bin train with xgb1.1.1

CODE1

import xgboost as xgb

print(xgb.__version__)
pred = xgb.DMatrix("1line_inst")

bst2 = xgb.Booster({'nthread': 4})  # init model
bst2.load_model('model2.bin')  # load data
print(bst2.predict(pred))

OUTPUT1

1.1.1
[15:16:14] 4x998 matrix with 2105 entries loaded from 1line_inst
Traceback (most recent call last):
  File "pred_zxb.py", line 12, in <module>
    print(bst2.predict(pred))
  File "/Users/zengwenqi/DXM/DXM-codebase/baidu/rimrdp/pipelines/venv/lib/python3.7/site-packages/xgboost/core.py", line 1580, in predict
    ctypes.byref(preds)))
  File "/Users/zengwenqi/DXM/DXM-codebase/baidu/rimrdp/pipelines/venv/lib/python3.7/site-packages/xgboost/core.py", line 190, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [15:16:14] /Users/travis/build/dmlc/xgboost/src/learner.cc:1070: Check failed: learner_model_param_.num_feature == p_fmat->Info().num_col_ (1104 vs. 998) : Number of columns does not match number of features in booster.
Stack trace:
  [bt] (0) 1   libxgboost.dylib                    0x0000000118c101c0 dmlc::LogMessageFatal::~LogMessageFatal() + 112
  [bt] (1) 2   libxgboost.dylib                    0x0000000118cbda2a xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const + 282
  [bt] (2) 3   libxgboost.dylib                    0x0000000118cbdb13 xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const + 67
  [bt] (3) 4   libxgboost.dylib                    0x0000000118cadecc xgboost::LearnerImpl::Predict(std::__1::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool) + 732

CODE2

import xgboost as xgb

print(xgb.__version__)
pred = xgb.DMatrix("1line_inst")

bst2 = xgb.Booster({'nthread': 4})  # init model
bst2.load_model('model1.bin')  # load data
print(bst2.predict(pred))

OUTPUT2

0.90
[15:20:51] 1x1000 matrix with 1 entries loaded from 1line_test
[0.0208639]

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 22 (10 by maintainers)

Most upvoted comments

We should support predicting on DMatrix with less features than model. Those features will be treated as missing.

Yes.

Should we support predicting on DMatrix with more features than model?

No. We should raise an error in this case.

Should we output a warning on mismatching number of features? With libsvm, this can be quite verbose.

No. If we only adopt the first heuristic, the behavior should be fairly predictable, I think.

on Python side, there’s a validate feature option, should we lower it down to C++ and make it a parameter?

Now that feature names and types in the C++ layer, we can. I’ll let you decide.