arrow: [R] Segmentation fault when using write_parquet()
Describe the bug, including details regarding any error messages, version, and platform.
I am randomly getting segfault when using write_parquet()
with the latest release (the same code works well with v 10.0.1).
*** caught segfault ***
address 0x18, cause 'memory not mapped'
Traceback:
1: Table__from_dots(dots, schema, option_use_threads())
2: Table$create(x, schema = schema)
3: as_arrow_table.data.frame(x)
4: as_arrow_table(x)
5: doTryCatch(return(expr), name, parentenv, handler)
6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
7: tryCatchList(expr, classes, parentenv, handlers)
8: tryCatch(as_arrow_table(x), arrow_no_method_as_arrow_table = function(e) { abort("Object must be coercible to an Arrow Table using `as_arrow_table()`", parent = e, call = caller_env(2))})
9: as_writable_table(x)
10: write_parquet(bioargo_dark_corrected, here("data", "raw", "bioargo", "bioargo_correction_c.parquet"))
11: eval(ei, envir)
12: eval(ei, envir)
13: withVisible(eval(ei, envir))
14: source(here("R", "001c_bioargo_chla_dark_correction.R"))
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Following this (https://arrow.apache.org/docs/7.0/r/articles/developers/debugging.html), here is the exact line when the code crashes.
Thread 1 "R" received signal SIGSEGV, Segmentation fault.
0x00007fffdccb0956 in std::__shared_ptr<arrow::DataType, (__gnu_cxx::_Lock_policy)2>::__shared_ptr (this=0x7ffffffc4d50) at /usr/include/c++/12/bits/shared_ptr_base.h:1522
1522 __shared_ptr(const __shared_ptr&) noexcept = default;
$> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.10
Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8
[4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gsw_1.1-1 fishmethods_1.12-0 santoku_0.9.0 arrow_11.0.0.100000089
[5] yardstick_1.1.0 workflowsets_1.0.0 workflows_1.1.2 tune_1.0.1
[9] rsample_1.1.1 recipes_1.0.4 parsnip_1.0.3 modeldata_1.1.0
[13] infer_1.0.4 dials_1.1.0 scales_1.2.1 broom_1.0.3
[17] tidymodels_1.0.0 data.table_1.14.6 furrr_0.3.1 future_1.31.0
[21] pins_1.1.0 tidyterra_0.3.1 terra_1.7-3 sf_1.0-9
[25] patchwork_1.1.2 tidync_0.3.0 here_1.0.1 glue_1.6.2
[29] ggpmthemes_0.0.2 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[33] dplyr_1.1.0 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[37] tibble_3.1.8 ggplot2_3.4.1 tidyverse_1.3.2.9000
loaded via a namespace (and not attached):
[1] minqa_1.2.5 colorspace_2.1-0 ellipsis_0.3.2 class_7.3-21
[5] rprojroot_2.0.3 fs_1.6.1 rstudioapi_0.14 proxy_0.4-27
[9] farver_2.1.1 listenv_0.9.0 bit64_4.0.5 prodlim_2019.11.13
[13] fansi_1.0.4 codetools_0.2-19 splines_4.2.2 ncdf4_1.21
[17] extrafont_0.19 jsonlite_1.8.4 nloptr_2.0.3 Rttf2pt1_1.3.12
[21] compiler_4.2.2 backports_1.4.1 assertthat_0.2.1 Matrix_1.5-3
[25] cli_3.6.0 tools_4.2.2 gtable_0.3.1 rappdirs_0.3.3
[29] Rcpp_1.0.10 RNetCDF_2.6-2 DiceDesign_1.9 vctrs_0.5.2
[33] nlme_3.1-162 extrafontdb_1.0 iterators_1.0.14 timeDate_4022.108
[37] gower_1.0.1 globals_0.16.2 lme4_1.1-31 timechange_0.2.0
[41] lifecycle_1.0.3 ncmeta_0.3.5 MASS_7.3-58.2 ipred_0.9-13
[45] hms_1.1.2 parallel_4.2.2 TMB_1.9.2 rpart_4.1.19
[49] stringi_1.7.12 foreach_1.5.2 e1071_1.7-13 lhs_1.1.6
[53] boot_1.3-28.1 hardhat_1.2.0 lava_1.7.1 rlang_1.0.6
[57] pkgconfig_2.0.3 lattice_0.20-45 labeling_0.4.2 bit_4.0.5
[61] tidyselect_1.2.0 parallelly_1.34.0 magrittr_2.0.3 R6_2.5.1
[65] generics_0.1.3 bootstrap_2019.6 DBI_1.1.3 pillar_1.8.1
[69] withr_2.5.0 units_0.8-1 survival_3.5-3 nnet_7.3-18
[73] future.apply_1.10.0 crayon_1.5.2 KernSmooth_2.23-20 utf8_1.2.3
[77] tzdb_0.3.0 grid_4.2.2 digest_0.6.31 classInt_0.4-8
[81] numDeriv_2016.8-1.1 GPfit_1.0-8 munsell_0.5.0
Component(s)
R
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (9 by maintainers)
Brilliant! I had to download the file separately but this is fantastic.
This reprex appears to crash R. See standard output and standard error for more details.
Standard output and error
Biogeochemical Argo!!! (Cool to see it here…that’s what I did in my previous job!)
Since all these seem related to
InferArrowType()
, do you mind attempting to print out the column names and types of the data frame you’re passing toarrow::write_parquet()
? You could do that maybe withstr(the_data_frame_right_before_write_parquet[integer(0), ])
?Just wanted to chime in here since I’m experiencing a very similar error.
It happens when I’m writing a dataframe to an arrow/feather file. Like for OP it works in arrow 10 but not 11. And is originating from the same line.
Here is the top of the stacktrace running R with debugger attached:
It happens during unit testing in my package so I can reproduce it as much as I want locally, and happens as well running github actions on ubuntu, windows and macOS. But unfortunately have not been able to create a minimum reproducible example. Just calling the function separately that gives the issue with the same inputs does not give the error. So it seems it is dependent on something that happens earlier during my unit testing.
I’ve tried turning of thread using
options(arrow.use_threads = FALSE)
but that doesn’t solve the issue.Attached is my sessionInfo and arrowInfo. If you have any ideas on how to debug further please let me know.
SessionInfo
R version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.10Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.1
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=nl_NL.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=nl_NL.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=nl_NL.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] arrow_11.0.0.2
loaded via a namespace (and not attached): [1] tidyselect_1.2.0 bit_4.0.4 compiler_4.2.2 magrittr_2.0.3 assertthat_0.2.1 R6_2.5.1
[7] cli_3.6.0 tools_4.2.2 glue_1.6.2 rstudioapi_0.14 bit64_4.0.5 vctrs_0.5.1
[13] lifecycle_1.0.3 rlang_1.0.6 purrr_1.0.1
arrowInfo
Arrow package version: 11.0.0.2Capabilities:
dataset TRUE substrait FALSE parquet TRUE json TRUE s3 TRUE gcs TRUE utf8proc TRUE re2 TRUE snappy TRUE gzip TRUE brotli TRUE zstd TRUE lz4 TRUE lz4_frame TRUE lzo FALSE bz2 TRUE jemalloc TRUE mimalloc TRUE
Memory:
Allocator jemalloc Current 0 bytes Max 0 bytes
Runtime:
SIMD Level avx2 Detected SIMD Level avx2
Build:
C++ Library Version 11.0.0 C++ Compiler GNU C++ Compiler Version 11.3.0