professional-services-data-validator: `generate-table-partitions` generates invalid SQL in filter clause when a timestamp field is part of the primary-key

The following command:

data-validation generate-table-partitions -sc rs -tc bq -tbls source.table=target.table --primary-keys _hash_key,_hash_diff,_creation_timestamp,source_last_update_date,delivery_area_id --partition-num 164 --filter-status fail --bq-result-handler pso_data_validator.results -cdir ~/

generates 164 partitioned configs as expected but the none of those can be run using data-validation configs run -c ~/config_X.yaml throwing the follow:

07/31/2023 09:37:36 AM-ERROR: Error (psycopg2.errors.SyntaxError) syntax error at or near "10"
LINE 3: ...2f114e622b' AND (_creation_timestamp > 2023-05-16 10:48:06.8...

It seems the filter: <sql_where_clause> part of the YALM contains timestamp values without the required quotes (see _creation_timestamp below).

  filters:
  - source: (_hash_key > '04b0a057a5ceccd4fff06984b5ed8e0ba1e90591' OR _hash_key =
      '04b0a057a5ceccd4fff06984b5ed8e0ba1e90591' AND (_hash_diff > 'cba27aebaf720f51987e0d3087213bd50fc3a70c'
      OR _hash_diff = 'cba27aebaf720f51987e0d3087213bd50fc3a70c' AND (_creation_timestamp
      > 2023-04-18 10:39:12.808967 OR _creation_timestamp = 2023-04-18 10:39:12.808967 ...

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 15 (6 by maintainers)

Commits related to this issue

Most upvoted comments

@florisvink Unfortunately, PR #962 introduced the usage of Ibis’s to_sql() method which compiles sqlalchemy queries. The to_sql() function only supports timestamps/date/datetime literal rendering starting in version 2.0: https://docs.sqlalchemy.org/en/20/changelog/changelog_20.html#change-206ec1f2af3a0c93785758c723ba356f

The ‘standard_conforming_strings’ error is because the redshift sqlalchemy connector doesn’t support sqlalchemy versions 2.0: https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/issues/264

Due to this package dependency conflict, there isn’t a way forward to support Redshift using the to_sql() method. This would probably require reverting the change and adding a condition to format timestamps.

Hi @helensilva14, thanks for your reply. We have a workaround for the issue on short term (a regex string-replace on all partitioned config files fixing the quotes for datetime values). I’ll keep an eye on updates in the develop branch since I understand this can be resolved fairly easy. Let me know if I can be of any help