OpenRefine: SQL exporter doesn't create compliant column names

Describe the bug The SQL exporter will not create SQL-compliant column names. If a column has spaces, they are replaced with “-” by default, and replaced with nothing if the “Trim Column Names” checkbox is unchecked. Hyphens are not an acceptable character in a SQL column name.

To Reproduce Steps to reproduce the behavior:

  1. Open a dataset that has column names with spaces in them.
  2. Click Export>SQL Exporter.
  3. Click “Download” tab.
  4. Click Preview
  5. Attempt to create a table in an SQL database with the SQL exported by OpenRefine. It will not create, and SQL will print an error like:
ERROR:  syntax error at or near "-"
LINE 11: Line1-Street TEXT NULL,
              ^

Current Results What results occured or were shown.

Expected behavior The SQL Exporter should, by default, convert column names to an SQL-compliant name, replacing spaces and other illegal characters with “_” (underscore). Column names can also not begin with numbers. They cannot have single quotes or punctuation.

Screenshots image image image

Desktop (please complete the following information):

  • OS: Mac 10.14.2
  • Browser Version: Safari 12.0.2
  • JRE or JDK Version: 1.8.0_121

OpenRefine (please complete the following information):

  • Version 3.1 [b90e413]

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 26 (20 by maintainers)

Commits related to this issue

Most upvoted comments

I am beginning to work on this. I am planning to convert the column names to valid SQL column names. Please assign the issue to me. Thanks.

@shelomi99 : the improvement I suggested are for ANOTHER NEW issue! You have done enough on this one, it’s done! 😉

For the other issue, you can ask me general questions thru email: antoine@beaubien.qc.ca, or specific questions about the issue on the issue itself.

Regards, Antoine

Okay, thanks @antoine2711 I will start working on it.

It looks sensible to me!

I started working on this issue by adding back quotes to each column name before exporting to sql to allow name flexibility. Please can you assign this issue to me. Thanks.

@stevevance I’d prefer that we just throw a Warning signal on the SQL Exporter dialog…and then stating a small message just beneath the Content|Download tabs stating something like “Column names contain invalid chars or reserved words and need to be renamed before continuing”

This is because there are more restrictions than just characters, but whole reserved word lists, such as in PostgreSQL and others. (cannot have a column named BETWEEN) And for accuracy, you can have spaces and punctuation in Database objects, but often you have to quote the objects all the time for both creation as well as referring to the object.

SELECT filling, topping, crust FROM "3.14159";

Also, if we wanted to support highlighting any problematic column names, we would have to find a SQL Rules Library that has all the rules/restrictions for all kinds of databases. Or implement them ourselves (not an option) after we pay ISO a lot of $$$ dollars just to see the rules https://www.iso.org/standard/63565.html

Does the warning message sound like a fair tradeoff?