LightGBM: [R-package] R package crashes on windows when loaded together with {fansi} or anything that depends on it

This is probably related to:

Description

Using lightgbm while parsnip is loaded crashes the R session with: Exited with status -1073741819.

Reproducible example

Calling:

library(parsnip)
library(lightgbm)
data(agaricus.train, package='lightgbm')
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
model <- lgb.cv(
       params = list(
       objective = "regression", 
       metric = "l2"
       ) , 
data = dtrain
)

Environment info

I am using the dev version of LightGBM as suggested in https://github.com/microsoft/LightGBM/issues/4007#issuecomment-869080432 The error only occurs on Windows.

Here’s a GitHub actions run that shows the behavior. This shows that it works fine if parsnip is not loaded: https://github.com/curso-r/treesnip/runs/3037580458?check_suite_focus=true#step:9:1 And this one shows the error message: https://github.com/curso-r/treesnip/runs/3037580458?check_suite_focus=true#step:10:21

I could also reproduce it locally on a Windows machine, but I am not sure what’s the best way to get a stack trace. Let me know if I can help with further debugging.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 25

Commits related to this issue

Most upvoted comments

Now that #4496 has been merged, I believe this issue has been resolved.

Thanks so much to everyone involved here for your help with reproducible examples and debugging ideas!

I’m convinced that the root of the problem is related to the way that R loads DLLs, and that @dfsnow is right that {lightgbm} and {fansi} are in conflict with each other somehow.

If {dplyr} is loaded before {lightgbm} but then the fansi DLL is unloaded before loading {lightgbm}, the reproducible example does not produce a segfault, and Dataset construction succeeds.

library(dplyr)
dyn.unload(file.path(.libPaths()[1], "fansi", "libs", "x64", "fansi.dll"))
library(lightgbm)
dtrain <- lgb.Dataset(
    data = matrix(rnorm(1000), nrow = 100)
    , label = rnorm(100)
)
dtrain$construct()

If {fansi}'s DLL is unloaded after loading {lightgbm}, that script produces a segfault at dtrain$construct().

This finding plus the finding from https://github.com/microsoft/LightGBM/issues/4464#issuecomment-886244523 that commenting out Network::num_machines() causes Dataset construction to succeed has led me to this working theory:

Something in {fansi}'s DLL conflicts with lightgbm.dll or IPHLPAPI.DLL or WS2_32.dll (two libraries linked in with {lightgbm} to support distributed training).

I’m going to investigate this more closely with dumpbin and listdlls to see if I can identify the conflicts. I’m also going to try changing some details of {fansi} based on the advice in “Writing R Extensions”, especially https://cran.r-project.org/doc/manuals/R-exts.html#Controlling-visibility.

Updates to follow!

I’m seeing the same issue. I’m guessing this may be related to #4007 and #4259. Some further details:

No crash

Running a clean install of the script below in a new project with renv enabled works for 3.2.1.99. See sessionInfo() below.

library(lightgbm)
data(agaricus.train, package='lightgbm')
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
model <- lgb.cv(
  params = list(
    objective = "regression"
    , metric = "l2"
  )
  , data = dtrain
)  
Session Info
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] lightgbm_3.2.1.99 R6_2.5.0      

loaded via a namespace (and not attached):
[1] compiler_4.1.0    Matrix_1.3-3      tools_4.1.0       grid_4.1.0        data.table_1.14.0
[6] jsonlite_1.7.2    renv_0.13.2       lattice_0.20-44  

Installing parsnip and loading it after lightgbm likewise does not result in a crash.

library(lightgbm)
library(parsnip)

data(agaricus.train, package='lightgbm')
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
model <- lgb.cv(
  params = list(
    objective = "regression"
    , metric = "l2"
  )
  , data = dtrain
)  

Session Info
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] parsnip_0.1.7  lightgbm_3.2.1.99 R6_2.5.0      

loaded via a namespace (and not attached):
 [1] magrittr_2.0.1    tidyselect_1.1.1  lattice_0.20-44   rlang_0.4.11      fansi_0.5.0      
 [6] dplyr_1.0.7       tools_4.1.0       hardhat_0.1.6     grid_4.1.0        data.table_1.14.0
[11] utf8_1.2.1        ellipsis_0.3.2    tibble_3.1.2      lifecycle_1.0.0   crayon_1.4.1     
[16] Matrix_1.3-3      purrr_0.3.4       tidyr_1.1.3       vctrs_0.3.8       glue_1.4.2       
[21] compiler_4.1.0    pillar_1.6.1      generics_0.1.0    jsonlite_1.7.2    renv_0.13.2      
[26] pkgconfig_2.0.3 

Crash

However, loading parsnip before lightgbm results in a crash at the lgb.cv step.

library(parsnip)
library(lightgbm)

data(agaricus.train, package='lightgbm')
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
model <- lgb.cv(
  params = list(
    objective = "regression"
    , metric = "l2"
  )
  , data = dtrain
)  
Session Info
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] lightgbm_3.2.1.99 R6_2.5.0       parsnip_0.1.7 

loaded via a namespace (and not attached):
 [1] magrittr_2.0.1    tidyselect_1.1.1  lattice_0.20-44   rlang_0.4.11      fansi_0.5.0      
 [6] dplyr_1.0.7       tools_4.1.0       parallel_4.1.0    hardhat_0.1.6     grid_4.1.0       
[11] data.table_1.14.0 utf8_1.2.1        ellipsis_0.3.2    tibble_3.1.2      lifecycle_1.0.0  
[16] crayon_1.4.1      Matrix_1.3-3      purrr_0.3.4       tidyr_1.1.3       vctrs_0.3.8      
[21] glue_1.4.2        compiler_4.1.0    pillar_1.6.1      generics_0.1.0    jsonlite_1.7.2   
[26] renv_0.13.2       pkgconfig_2.0.3  

Notes

  • Once lightgbm has crashed once due to parsnip, it crashes permanently for me regardless of whether or not parsnip is loaded again (even the first script does not work again after a crash).
  • Reinstalling lightgbm via renv::install("lightgbm", rebuild = TRUE) seems to fix this problem for both the CRAN and GitHub versions.

Edit

Did a quick trip through the Imports of parsnip, loading each library before lightgbm 1-by-1. The following libraries cause crashes:

dplyr (1.0.7)
hardhat (0.1.6)
tibble (3.1.2)
tidyr (1.1.3)

While the following cause no issues:

generics (0.1.0)
globals (0.14.0)
glue (1.4.2)
lifecycle (1.0.0)
magrittr (2.0.1)
prettyunits (1.1.1)
purrr (0.3.4)
rlang (0.4.11)
stats
utils
vctrs (0.3.8)

I then traveled through the dependencies of tibble and dplyr to find the lowest level library call that will cause a crash. Seems like fansi may be the actual culprit. The script below causes a crash for me in a fresh environment with lightgbm 3.2.1 (from CRAN) and 3.2.1.99 (from GitHub)

library(fansi)
library(lightgbm)

data(agaricus.train, package='lightgbm')
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
model <- lgb.cv(
  params = list(
    objective = "regression"
    , metric = "l2"
  )
  , data = dtrain
)  
Session Info
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lightgbm_3.2.1.99 R6_2.5.0       fansi_0.5.0   

loaded via a namespace (and not attached):
 [1] magrittr_2.0.1    tidyselect_1.1.1  lattice_0.20-44   rlang_0.4.11      stringr_1.4.0    
 [6] dplyr_1.0.6       tools_4.1.0       grid_4.1.0        parallel_4.1.0    data.table_1.14.0
[11] audio_0.1-7       utf8_1.2.1        DBI_1.1.1         ellipsis_0.3.2    assertthat_0.2.1 
[16] tibble_3.1.2      lifecycle_1.0.0   crayon_1.4.1      Matrix_1.3-3      beepr_1.3        
[21] purrr_0.3.4       vctrs_0.3.8       glue_1.4.2        ccao_0.5.1        stringi_1.6.2    
[26] compiler_4.1.0    pillar_1.6.1      generics_0.1.0    jsonlite_1.7.2    pkgconfig_2.0.3 

I have multiple versions. 4.0.1 was first on PATH. And running Rscript --version gave this:

Rscript --version
R scripting front-end version 4.0.1 (2020-06-06)

Didn’t know about it. Thanks for the help.

After solving a problem related to rtools everything works now 😃

This was installed for R-4.0.x even though I have R-4.1.x installed. Is this not supported for R 4.1.x?

Hi @jameslamb, thanks for looking at this!

I have added the remove.packages("lightgbm") call and the error still persists: https://github.com/curso-r/treesnip/runs/3154997458?check_suite_focus=true#step:11:50 I think install.packages ultimately always removes the existing package folder before installing the package again.

For the second question, I can confirm that error happens on both RStudio and on a vanilla R session:

$ Rscript --vanilla R/test.R
Loading required package: R6
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] lightgbm_3.2.1.99 R6_2.5.0          parsnip_0.1.6

loaded via a namespace (and not attached):
 [1] lattice_0.20-44   tidyr_1.1.3       fansi_0.5.0       utf8_1.2.1
 [5] crayon_1.4.1      dplyr_1.0.7       grid_4.1.0        jsonlite_1.7.2
 [9] lifecycle_1.0.0   magrittr_2.0.1    pillar_1.6.1      rlang_0.4.11
[13] data.table_1.14.0 Matrix_1.3-3      vctrs_0.3.8       generics_0.1.0
[17] ellipsis_0.3.2    tools_4.1.0       glue_1.4.2        purrr_0.3.4
[21] compiler_4.1.0    pkgconfig_2.0.3   tidyselect_1.1.1  tibble_3.1.2
Segmentation fault