arrow: [R] Weird R error: Error in fs___FileSystem__GetTargetInfos_FileSelector(self, x) : ignoring SIGPIPE signal

Okay apologies, this is a bit of a weird error but is annoying the heck out of me.  The following block of all R code, when run with Rscript (or embedded into any form of Rmd, quarto, knitr doc) produces the error below (at least most of the time):


library(arrow)
library(dplyr)

Sys.setenv(AWS_EC2_METADATA_DISABLED = "TRUE")
Sys.unsetenv("AWS_ACCESS_KEY_ID")
Sys.unsetenv("AWS_SECRET_ACCESS_KEY")
Sys.unsetenv("AWS_DEFAULT_REGION")
Sys.unsetenv("AWS_S3_ENDPOINT")s3 <- arrow::s3_bucket(bucket = "scores/parquet",
                       endpoint_override = "data.ecoforecast.org")
ds <- arrow::open_dataset(s3, partitioning = c("theme", "year"))
ds |> dplyr::filter(theme == "phenology") |> dplyr::collect()

Gives the error


Error in fs___FileSystem__GetTargetInfos_FileSelector(self, x) : 
  ignoring SIGPIPE signal
Calls: %>% ... <Anonymous> -> fs___FileSystem__GetTargetInfos_FileSelector 

But only when run as a script! When run interactively in an R console, this code runs just fine.  Even as a script the code seems to run fine, but erroneously seems to be attempting this sigpipe I don’t understand.

If the script is executed with litter (https://dirk.eddelbuettel.com/code/littler.html) then it runs fine, since littler handles sigpipe but Rscripts don’t.  But I have no idea why the above code throws a pipe in the first place.  Worse, if I choose a different filter for the above, like “aquatics”, it (usually) works without the error.

I have no idea why fs___FileSystem__GetTargetInfos_FileSelector results in this, but would really appreciate any hints on how to avoid this as it makes it very hard to use arrow in workflows right now!

thanks for all you do!

Reporter: Carl Boettiger / @cboettig

Note: This issue was originally created as ARROW-16680. Please see the migration documentation for further details.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 23 (3 by maintainers)

Most upvoted comments

Dewey Dunnington / @paleolimbot: It sounds like ignoring sigpipe unconditionally (i.e., in Arrow C++ or AWS SDK C++ code) is generally considered a bad idea; however, ignoring it within the session where you’re running into this problem is probably fine. I can’t reproduce this locally but you could try something like this as a workaround:


cpp11::cpp_source(code = '
#include <csignal>
#include <cpp11.hpp>

[[cpp11::register]] void ignore_sigpipes() {
  signal(SIGPIPE, SIG_IGN);
}
')

ignore_sigpipes()