OpenRefine: Unsupported datatypes should be imported as strings

There are a large number of Excel date, time, duration, post code, etc formats which are not supported by OpenRefine, but are currently imported as either numbers or datetimes. Anything unsupported should be imported as a string instead of a number or datetime.

To Reproduce

Steps to reproduce the behavior:

  1. Import an Excel worksheet containing dates such as 09:46:00 AM in a column
  2. Open the project

Current Results

The cell is converted to 1899-12-31T09:46:00Z.

Expected Behavior

The column should be kept as a string.

Additional context

I have not found any way to disable cast on import.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (15 by maintainers)

Commits related to this issue

Most upvoted comments

One big issue I have had with importing dates as datetimes with a time of midnight is the timezone-sensitivity. If somehow the importer takes it to be midnight in the local time determined by the user’s settings, and then the datetime is represented in UTC, then you can get off by one errors on the date components. It can be really frustrating.

For that reason I think I prefer importing them as strings - in the spirit of “what you see in Excel is what you get in OpenRefine”.

Would you be able to provide a sample XLS file with a time value? I would then mark this as a good first issue.

openrefine-5390.xlsx

Internally these are all just datetimes with formatting strings to Excel. Also, OpenRefine doesn’t have a time (or date) data type - only datetime.

Converting Excel dates to strings my require a full and faithful implementation of the Excel date formatting (unless there’s a cached version of the pre-rendered string available). I’m not sure that would be super easy as a first issue.

In this specific case, I would suggest that time should not be cast, no matter if the user selects to cast dates or not, since time is not a date. Having an option to disable automatic cast would be useful, nonetheless.