xgboost: Predict error in R as of 1.1.1

R version: 3.6.1 (Action of the Toes) xgboost version: 1.1.1.1

This error can be produced when attempting to call predict on an xgboost model developed pre-1.0

Error: Error in predict.xgb.Booster(model, data) : [11:24:23] amalgamation/../src/learner.cc:506: Check failed: mparam_.num_feature != 0 (0 vs. 0) : 0 feature is supplied. Are you using raw Booster interface?

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 4
Comments: 103 (18 by maintainers)

Commits related to this issue

Revert "Update syntax & remove deprecated args for xgboost" This reverts commit cef0f47e09b517434892986c719f27b530640d8d. Update from v1.0.0.2 -- v1.1.1.1 delayed until reverse compatibility of xgboo... — committed to Displayr/flipMultivariates by jrwishart 4 years ago

Most upvoted comments

@jrausch12 @shapenaji @jrwishart @jameslamb @meztez @mpquast @euklid321 @leedrake5 The 1.2.0 version of XGBoost is now available on CRAN: https://cran.r-project.org/web/packages/xgboost/index.html. It is now able to read models from old RDS files. The manual also has a guidance for future-proofing your models: https://www.rdocumentation.org/packages/xgboost/versions/1.2.0.1/topics/a-compatibility-note-for-saveRDS-save

hcho3 on Sep 2, 2020

What I ended up doing to convert the model and keep meta data

xgb_model <- readRDS("pre1.0model.RDS")
xgb_model$handle <- xgboost:::xgb.load.raw(xgb_model$raw)xgb_model$raw <- xgb.save.raw(xgb.load.raw(xgb_model$raw))
saveRDS("post1.0model.RDS")

then update handle in API or before using

xgb_model <- readRDS("post1.0model.RDS")
xgb_model$handle <- xgb.load.raw(xgb_model$raw)

For package use, I use an onLoad function to restore handle on package load and I store models in a new.env(). zzz.R

.onLoad <- function(...) {
  load_models_onLoad()
}

in package code

models <- new.env()

#' Restore xgbooost models...
#' @importFrom xgboost xgb.load.raw
#' @export
load_models_onLoad <- function() {
  data(xgb_model, package = "package_name", envir = models)
  models$xgb_model$handle <- xgb.load.raw(models$xgb_model$raw)
}

You can’t modify data that comes from the package directly, so this work around work for us.

meztez on Jun 17, 2020

The problem we have is all the models saved using saveRDS over the past year and a half

@leedrake5 were some of those models saved with R version 3.6.x? I see you’re now using 4.0.1. .rds files are not guaranteed to be compatible across major versions of R

jameslamb on Jul 28, 2020

@jrausch12 @shapenaji @jrwishart @jameslamb @meztez @mpquast @euklid321 @leedrake5 Hello everyone, today I had a Eureka moment and found a way to make old RDS files readable again in latest XGBoost: #5940. This pull requests add a compatibility layer to read XGBoost models from old RDS files.

Example: Suppose we have a RDS file that saved a XGBoost model from XGBoost 1.0.0 or 0.90:

require(xgboost)
packageVersion("xgboost")   # "0.90.0.1" or "1.0.0.1"

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

bst <- xgboost::xgboost(data = agaricus.train$data, label = agaricus.train$label,
               max_depth = 2, eta = 1, nthread = 2, nrounds = 2,
               objective = "binary:logistic")
saveRDS(bst, file = "model_xgb100.rds")

With the patch fix from #5940, the latest XGBoost will be able to read the model from the RDS file:

require(xgboost)
packageVersion("xgboost")   # "1.2.0.1" (development version, compiled from the source)

data(agaricus.test, package='xgboost')

bst <- readRDS("model_xgb100.rds")
pred <- predict(bst, newdata = agaricus.test$data)  # this line will actually work

Output:

Loading required package: xgboost
[1] ‘1.2.0.1’
Warning message:
In value[[3L]](cond) :
  Loading model from a RDS file from XGBoost version 1.0.0 or earlier.
  We strongly ADVISE AGAINST using saveRDS() / readRDS() functions, to ensure that your model
  can be read in current and upcoing XGBoost releases.
  Consider using xgb.save() / xgb.load() instead.

hcho3 on Jul 24, 2020

@shapenaji

xgboost::xgb.save.raw(model$finalModel) Error in xgb.get.handle(model) : invalid xgb.Booster.handle

Here you need xgb.Booster.complete: (supposing model is your object from caret)

aux_model <- model$finalModel aux_model <- xgb.Booster.complete(aux_model, saveraw = TRUE)

Then proceed with the load.R/save.R using this aux_model. But be aware that you´ll be working with an xgboost object, not a caret object. At the end, if you want to come back to caret, you´ll have to do something like (using the bst2 object created in load.R):

model$finalModel$handle <- bst2$handle model$finalModel$raw <- raw.vec.loaded

I didn´t test the final line (saving as JSON).

mpquast on Jun 23, 2020

@jrwishart The workaround seems reasonable for this moment. However, it is not future-proof.

it is unlikely that we’d change the serialization for xgboost.

I respectfully ask you to reconsider. The workaround may stop working at any moment in future releases of XGBoost. We (**) are unable to provide any support whatsoever when the same issue arises again with saveRDS(). Also, some significant features we (**) are planning will be depending on the JSON serialization format (*). You should ask yourself how important it is for your users to retain access to latest versions of XGBoost. If latest access is important, you should plan a switch to the recommended way of serializing XGBoost models.

(*) For example, fitting binary splits with categorical features without one-hot encoding. (**) XGBoost developers. See the list here.

hcho3 on Jun 18, 2020

Full example of saving a model from 0.90 and loading it back in 1.1.0 (note the use of xgb.save.raw and xgb.load.raw):

save.R

remotes::install_version("xgboost", "0.90.0.1", quiet = FALSE)
packageVersion("xgboost")
library(xgboost)

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label,
               max_depth = 2, eta = 1, nthread = 2, nrounds = 2,
               objective = "binary:logistic")

preds <- predict(bst, newdata = agaricus.test$data)
raw.vec <- xgb.save.raw(bst)

saveRDS(raw.vec, 'my_model.rds')
saveRDS(preds, 'preds.rds')

load.R

install.packages("xgboost", quiet = FALSE)
packageVersion("xgboost")
library(xgboost)

data(agaricus.test, package='xgboost')

preds <- readRDS('preds.rds')

# Load the model back
raw.vec.loaded <- readRDS('my_model.rds')
if (compareVersion(as.character(packageVersion("xgboost")), "1.1.1.0") == -1) {
  bst2 <- xgb.load(raw.vec.loaded)   # pre-1.1.0
} else {
  bst2 <- xgboost:::xgb.handleToBooster(xgb.load.raw(raw.vec.loaded))   # 1.1.0+
}
data(agaricus.test, package='xgboost')
preds2 <- predict(bst2, newdata = agaricus.test$data)
print(preds-preds2)

# Save as JSON, for archiving
xgb.save(bst2, 'my_model.json')

I tested it as follows:

Rscript save.R && Rscript load.R

hcho3 on Jun 17, 2020

Here is a working example for writing JSON:

library(xgboost)

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label,
               max_depth = 2, eta = 1, nthread = 2, nrounds = 2,
               objective = "binary:logistic",
               enable_experimental_json_serialization = TRUE)

preds <- predict(bst, newdata = agaricus.test$data)
json_str <- xgb.serialize(bst)
print(rawToChar(json_str))

saveRDS(json_str, 'my_model.rds')

# Load the model back
json_str_loaded <- readRDS('my_model.rds')
stopifnot(json_str == json_str_loaded)
bst2 <- xgb.load(json_str_loaded)
preds2 <- predict(bst2, newdata = agaricus.test$data)
print(preds-preds2)

The feature is quite new, however, so it won’t work with XGBoost version older than 1.1.

The reference manual should be updated to mention this feature.

EDIT. Add example of running prediction after loading.

hcho3 on Jun 17, 2020

Is it possible to customize it?

Looking at the documentation for saveRDS(), it looks like the refhook parameter might offer the type of customization you want.

refhook: a hook function for handling reference objects.

I have no experience with that, but it might be helpful for taking references and resolving them so you can write enough information into the .rds object.

In my humble opinion, it would be enough to say “we support xgb.load() and xgb.save() for serializing and deserializing model objects. readRDS() / writeRDS() are not supported and might cause issues”.

jameslamb on Jun 17, 2020