arrow: [R] arrow implementation of lubridate::dmy parses invalid date "00001976" as date

Sorry for so many issues, but I think this is another bug.

Wrong behavior of the arrow implementation of the  lubridate::dmy.

An invalid date such as ‘00001976’ is being parsed as a valid (and completely unrelated) date.

#in R ‘00001976’ %>% dmy [1] NA Warning message:   All formats failed to parse. No formats found.

#In arrow q <- data.table(x=c(‘00001976’,‘30111976’,‘01011976’)) q %>% write_dataset(‘q’) q2 <- ‘q’ %>% open_dataset %>% mutate(x2=dmy) %>% collect q2 x 1: 1975-11-30 2: 1976-11-30 3: 1976-01-01 #notice ‘00001976’ is an invalid date. First row of x2 should be NA!!!

Reporter: Lucas Mation / @lucasmation

Note: This issue was originally created as ARROW-18242. Please see the migration documentation for further details.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 16 (1 by maintainers)

Most upvoted comments

@paleolimbot As discussed, I tested this out on Windows, setting the locale to “C”. There, I get the same results as shown in the initial reprex and changing the locale doesn’t fix it.