duckdb: Missing timestamp components for EXTRACT
The EXTRACT method on timestamps (and date, time, timestamp_tz) are missing fields:
TIME:
- subsecond
DATE:
- iso_year
- us_year
- day_of_year
- monday_day_of_week
- sunday_day_of_week
- monday_week
- sunday_week
- iso_week
- us_week
- unix_time
TIMESTAMP_TZ:
- timezone_offset
all the fields missing from time or date are also missing from: TIMESTAMP_SEC TIMESTAMP_MS TIMESTAMP_US TIMESTAMP_NS
Descriptions that provide some more context for these fields/components:
* SUBSECOND Return number of microseconds since the last full second of the given timestamp.
* ISO_YEAR Return the ISO 8601 week-numbering year. First week of an ISO year has the majority (4 or more) of its
days in January.
* US_YEAR Return US epidemiological year. First week of US epidemiological year has the majority (4 or more) of
its days in January. Last week of US epidemiological year has the year's last Wednesday in it.
US epidemiological week starts on Sunday.
* DAY_OF_YEAR Return the day of year (1 to 366). January 1st maps to day number 1, February 1st to 32, etc.
* MONDAY_DAY_OF_WEEK Return the day of week from Monday (1) to Sunday (7).
* SUNDAY_DAY_OF_WEEK Return the day of week from Sunday (1) to Saturday (7).
* MONDAY_WEEK Return week of year number (1-53). First week starts on first Monday of January.
* SUNDAY_WEEK Return week of year number (1-53). First week starts on first Sunday of January.
* ISO_WEEK Return ISO week of year number (1-53). First ISO calendar week has the majority (4 or more) of its
days in January. ISO week starts on Monday.
* US_WEEK Return ISO-like week of year number (1-53). First US calendar week has the majority (4 or more) of its days in January. US week starts on Sunday.
* UNIX_TIME Return number of seconds that have elapsed since 1970-01-01 00:00:00 UTC, ignoring leap seconds.
* TIMEZONE_OFFSET Return number of seconds of timezone offset to UTC.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 26 (10 by maintainers)
To me making this change at the Subtrait consumer/producer level makes the most sense. The DuckDB substrait consumer/producer knows DuckDB’s and Substrait’s semantics, and can add a +1 or -1 as required as part of the translation. Or is there a requirement that a function maps 1:1 at that level?
I don’t quite follow why the result would be different- the producer would add a +1, and the consumer would add a -1. The end result of the query will then be the same as long as the consumer/producer both make the correct translation, no?
No existing function yet has been described to depend on session settings. However, there was some discussion about it at some point (I think in the proposal for
randomwhich didn’t make it through) so I won’t rule it out entirely.However, in this case, session settings should not be a wrinkle. Once we clarify in Substrait whether these indices are 0-based or 1-based then there should be one answer that does not depend on session settings.
If DuckDb is the only consumer that uses 1-based indexing, given that DuckDb can already consume and produce Substrait, I think it would be fine to handle it within DuckDb. However, if there are other producers / consumers, then I think there is value in expressing the option. There’s pros & cons to both approaches.
However, I think the goal of Substrait (and this is certainly open for discussion) is to lean towards a flexible option set with the hopes that translators eventually emerge which centralize all of these +1/-1 translations in a single point.
I’ve opened https://github.com/substrait-io/substrait/pull/479 to control indexing.
MS-SQL, DB2, Snowflake, and Clickhouse all return 1-based weekdays (although some of them have an option to use 0).
Postgres is 0-based for non-ISO weekdays, but 1-based for ISO (we follow them). SQLite is 0-based, but it has to use
strftime(%w).Oracle doesn’t support weekday extraction.
There is also the issue of what the start of the week is set to (we use ICU, and it is configurable, but we only use it for ISO week arithmetic). It is worth mentioning that ICU is 1-based
Hey yes, this is related to substrait integration, I couldn’t find the functionality, but it’s good to know that we support it, just under a different name, that’s no problem and that glue should be provided by the
substraitextension