xgboost: Predict error in R as of 1.1.1

R version: 3.6.1 (Action of the Toes) xgboost version: 1.1.1.1

This error can be produced when attempting to call predict on an xgboost model developed pre-1.0

Error: Error in predict.xgb.Booster(model, data) : [11:24:23] amalgamation/../src/learner.cc:506: Check failed: mparam_.num_feature != 0 (0 vs. 0) : 0 feature is supplied. Are you using raw Booster interface?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 103 (18 by maintainers)

Commits related to this issue

Most upvoted comments

@jrausch12 @shapenaji @jrwishart @jameslamb @meztez @mpquast @euklid321 @leedrake5 The 1.2.0 version of XGBoost is now available on CRAN: https://cran.r-project.org/web/packages/xgboost/index.html. It is now able to read models from old RDS files. The manual also has a guidance for future-proofing your models: https://www.rdocumentation.org/packages/xgboost/versions/1.2.0.1/topics/a-compatibility-note-for-saveRDS-save

What I ended up doing to convert the model and keep meta data

xgb_model <- readRDS("pre1.0model.RDS")
xgb_model$handle <- xgboost:::xgb.load.raw(xgb_model$raw)xgb_model$raw <- xgb.save.raw(xgb.load.raw(xgb_model$raw))
saveRDS("post1.0model.RDS")

then update handle in API or before using

xgb_model <- readRDS("post1.0model.RDS")
xgb_model$handle <- xgb.load.raw(xgb_model$raw)

For package use, I use an onLoad function to restore handle on package load and I store models in a new.env(). zzz.R

.onLoad <- function(...) {
  load_models_onLoad()
}

in package code

models <- new.env()

#' Restore xgbooost models...
#' @importFrom xgboost xgb.load.raw
#' @export
load_models_onLoad <- function() {
  data(xgb_model, package = "package_name", envir = models)
  models$xgb_model$handle <- xgb.load.raw(models$xgb_model$raw)
}

You can’t modify data that comes from the package directly, so this work around work for us.

The problem we have is all the models saved using saveRDS over the past year and a half

@leedrake5 were some of those models saved with R version 3.6.x? I see you’re now using 4.0.1. .rds files are not guaranteed to be compatible across major versions of R

@jrausch12 @shapenaji @jrwishart @jameslamb @meztez @mpquast @euklid321 @leedrake5 Hello everyone, today I had a Eureka moment and found a way to make old RDS files readable again in latest XGBoost: #5940. This pull requests add a compatibility layer to read XGBoost models from old RDS files.

Example: Suppose we have a RDS file that saved a XGBoost model from XGBoost 1.0.0 or 0.90:

require(xgboost)
packageVersion("xgboost")   # "0.90.0.1" or "1.0.0.1"

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

bst <- xgboost::xgboost(data = agaricus.train$data, label = agaricus.train$label,
               max_depth = 2, eta = 1, nthread = 2, nrounds = 2,
               objective = "binary:logistic")
saveRDS(bst, file = "model_xgb100.rds")

With the patch fix from #5940, the latest XGBoost will be able to read the model from the RDS file:

require(xgboost)
packageVersion("xgboost")   # "1.2.0.1" (development version, compiled from the source)

data(agaricus.test, package='xgboost')

bst <- readRDS("model_xgb100.rds")
pred <- predict(bst, newdata = agaricus.test$data)  # this line will actually work

Output:

Loading required package: xgboost
[1] ‘1.2.0.1’
Warning message:
In value[[3L]](cond) :
  Loading model from a RDS file from XGBoost version 1.0.0 or earlier.
  We strongly ADVISE AGAINST using saveRDS() / readRDS() functions, to ensure that your model
  can be read in current and upcoing XGBoost releases.
  Consider using xgb.save() / xgb.load() instead.

@shapenaji

xgboost::xgb.save.raw(model$finalModel) Error in xgb.get.handle(model) : invalid xgb.Booster.handle

Here you need xgb.Booster.complete: (supposing model is your object from caret)

aux_model <- model$finalModel aux_model <- xgb.Booster.complete(aux_model, saveraw = TRUE)

Then proceed with the load.R/save.R using this aux_model. But be aware that you´ll be working with an xgboost object, not a caret object. At the end, if you want to come back to caret, you´ll have to do something like (using the bst2 object created in load.R):

model$finalModel$handle <- bst2$handle model$finalModel$raw <- raw.vec.loaded

I didn´t test the final line (saving as JSON).

@jrwishart The workaround seems reasonable for this moment. However, it is not future-proof.

it is unlikely that we’d change the serialization for xgboost.

I respectfully ask you to reconsider. The workaround may stop working at any moment in future releases of XGBoost. We (**) are unable to provide any support whatsoever when the same issue arises again with saveRDS(). Also, some significant features we (**) are planning will be depending on the JSON serialization format (*). You should ask yourself how important it is for your users to retain access to latest versions of XGBoost. If latest access is important, you should plan a switch to the recommended way of serializing XGBoost models.

(*) For example, fitting binary splits with categorical features without one-hot encoding. (**) XGBoost developers. See the list here.

Full example of saving a model from 0.90 and loading it back in 1.1.0 (note the use of xgb.save.raw and xgb.load.raw):

save.R

remotes::install_version("xgboost", "0.90.0.1", quiet = FALSE)
packageVersion("xgboost")
library(xgboost)

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label,
               max_depth = 2, eta = 1, nthread = 2, nrounds = 2,
               objective = "binary:logistic")

preds <- predict(bst, newdata = agaricus.test$data)
raw.vec <- xgb.save.raw(bst)

saveRDS(raw.vec, 'my_model.rds')
saveRDS(preds, 'preds.rds')

load.R

install.packages("xgboost", quiet = FALSE)
packageVersion("xgboost")
library(xgboost)

data(agaricus.test, package='xgboost')

preds <- readRDS('preds.rds')

# Load the model back
raw.vec.loaded <- readRDS('my_model.rds')
if (compareVersion(as.character(packageVersion("xgboost")), "1.1.1.0") == -1) {
  bst2 <- xgb.load(raw.vec.loaded)   # pre-1.1.0
} else {
  bst2 <- xgboost:::xgb.handleToBooster(xgb.load.raw(raw.vec.loaded))   # 1.1.0+
}
data(agaricus.test, package='xgboost')
preds2 <- predict(bst2, newdata = agaricus.test$data)
print(preds-preds2)

# Save as JSON, for archiving
xgb.save(bst2, 'my_model.json')

I tested it as follows:

Rscript save.R && Rscript load.R

Here is a working example for writing JSON:

library(xgboost)

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label,
               max_depth = 2, eta = 1, nthread = 2, nrounds = 2,
               objective = "binary:logistic",
               enable_experimental_json_serialization = TRUE)

preds <- predict(bst, newdata = agaricus.test$data)
json_str <- xgb.serialize(bst)
print(rawToChar(json_str))

saveRDS(json_str, 'my_model.rds')

# Load the model back
json_str_loaded <- readRDS('my_model.rds')
stopifnot(json_str == json_str_loaded)
bst2 <- xgb.load(json_str_loaded)
preds2 <- predict(bst2, newdata = agaricus.test$data)
print(preds-preds2)

The feature is quite new, however, so it won’t work with XGBoost version older than 1.1.

The reference manual should be updated to mention this feature.

EDIT. Add example of running prediction after loading.

Is it possible to customize it?

Looking at the documentation for saveRDS(), it looks like the refhook parameter might offer the type of customization you want.

refhook: a hook function for handling reference objects.

I have no experience with that, but it might be helpful for taking references and resolving them so you can write enough information into the .rds object.

In my humble opinion, it would be enough to say “we support xgb.load() and xgb.save() for serializing and deserializing model objects. readRDS() / writeRDS() are not supported and might cause issues”.