OpenRefine: Unsupported datatypes should be imported as strings
There are a large number of Excel date, time, duration, post code, etc formats which are not supported by OpenRefine, but are currently imported as either numbers or datetimes. Anything unsupported should be imported as a string instead of a number or datetime.
To Reproduce
Steps to reproduce the behavior:
- Import an Excel worksheet containing dates such as
09:46:00 AMin a column - Open the project
Current Results
The cell is converted to 1899-12-31T09:46:00Z.
Expected Behavior
The column should be kept as a string.
Additional context
I have not found any way to disable cast on import.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (15 by maintainers)
Commits related to this issue
- Import cells with unsupported date formats as strings. refs #5390 — committed to tfmorris/OpenRefine by tfmorris 2 years ago
- Import cells with unsupported date formats as strings. refs #5390 — committed to tfmorris/OpenRefine by tfmorris 2 years ago
- Import cells with unsupported date & number formats as strings Fixes #5389. Fixes #5390. - Import floats, integers, percentages, & currency as numbers preferentially as integers - Import everythin... — committed to tfmorris/OpenRefine by tfmorris 2 years ago
- Import unsupported datatypes as strings (#5468) Import cells with unsupported date & number formats as strings Fixes #5389. Fixes #5390. - Import floats, integers, percentages, & currency as nu... — committed to OpenRefine/OpenRefine by tfmorris 2 years ago
One big issue I have had with importing dates as datetimes with a time of midnight is the timezone-sensitivity. If somehow the importer takes it to be midnight in the local time determined by the user’s settings, and then the datetime is represented in UTC, then you can get off by one errors on the date components. It can be really frustrating.
For that reason I think I prefer importing them as strings - in the spirit of “what you see in Excel is what you get in OpenRefine”.
openrefine-5390.xlsx
Internally these are all just datetimes with formatting strings to Excel. Also, OpenRefine doesn’t have a time (or date) data type - only datetime.
Converting Excel dates to strings my require a full and faithful implementation of the Excel date formatting (unless there’s a cached version of the pre-rendered string available). I’m not sure that would be super easy as a first issue.
In this specific case, I would suggest that time should not be cast, no matter if the user selects to cast dates or not, since time is not a date. Having an option to disable automatic cast would be useful, nonetheless.