dbt-athena: Getting random "mismatched input 'cascade'." during dbt test

First of all: Thank you for your amazing contribution and bringing Athena engine v3 features to the DBT community!

I am currently tracking a bug that seems to occur non-deterministically. Essentially, during dbt test, I get syntax errors like these:

botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:100: mismatched input 'cascade'. Expecting: '.', <EOF

It happens during the initial cleanup phase when DBT tries to delete existing test tables. To me, it looks like sometimes, the macro drop_relation from this repo is not found and DBT falls back to using default__drop_relation, which does a

drop {{ relation.type }} if exists {{ relation }} cascade

Below is the relevant log output of such an error:

-- /* {"app": "dbt", "dbt_version": "1.3.1", "profile_name": "athena", "target_name": "prelive", "node_id": "test.data_importer.not_null_bestaende_cleaned_lagerbestand.d2103837db"} */
12:21:35.254829 [debug] [Thread-7 (]: Opening a new connection, currently in state closed
Failed to execute query.
Traceback (most recent call last):
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/pyathena/common.py", line 494, in _execute
    query_id = retry_api_call(
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/pyathena/util.py", line 68, in retry_api_call
    return retry(func, *args, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 406, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 351, in iter
    return fut.result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/tenacity/__init__.py", line 409, in __call__
    result = fn(*args, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/pysetup/.venv/lib/python3.10/site-packages/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:102: mismatched input 'cascade'. Expecting: '.', <EOF>
12:21:35.631219 [debug] [Thread-8 (]: Athena adapter: Error running SQL: drop table if exists dbt_test__audit.dbt_expectations_expect_column_93d8b305ae3a3af2a182dc0d9423d314 cascade
-- /* {"app": "dbt", "dbt_version": "1.3.1", "profile_name": "athena", "target_name": "prelive", "node_id": "test.data_importer.dbt_expectations_expect_column_value_lengths_to_equal_rueres_wm_ek__wm_ek_IS_NOT_NULL__5.bc16ec6d3f"} */
12:21:35.631863 [debug] [Thread-8 (]: Athena adapter: Error running SQL: macro drop_relation
12:21:35.632307 [debug] [Thread-8 (]: finished collecting timing info
12:21:35.632645 [debug] [Thread-8 (]: On test.data_importer.dbt_expectations_expect_column_value_lengths_to_equal_rueres_wm_ek__wm_ek_IS_NOT_NULL__5.bc16ec6d3f: Close
12:21:35.633406 [debug] [Thread-8 (]: Runtime Error in test dbt_expectations_expect_column_value_lengths_to_equal_rueres_wm_ek__wm_ek_IS_NOT_NULL__5 (models/host/schema.yml)

I had some success on my local machine by adding a simple Jinja comment in front of this line.

Is it possible that there is a race condition when registering the macro that leads to spurious errors when the macro is not yet registered when test tables are being deleted? I am not very familiar with the underlying process of how DBT adapters work together with DBT core, just guessing here.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (16 by maintainers)

Commits related to this issue

Most upvoted comments

This seems to work for me as well, thank you @nicor88!

The nice thing about it is that it addresses another issue that I was having from time to time during development: When you switch a table type between "hive" and "iceberg", the drop_table macro would not delete the existing S3 paths on the next run, and the Athena DROP TABLE statement would time out. This can’t happen anymore with #119 as it is always checked with Glue what table type we are dealing with instead of relying on user config.

Btw, I had other cases with DROP TABLE statements on Iceberg tables timing out and filed issues with AWS support. In one case, they found a bug related to trailing slashes in the S3 location property which should be fixed soon.

@jurgispods After some thoughts, I don’t think that the issue will be fix in dbt-core. There are cases where we want to drop a relation and no config context is available, e.g. for some table dropping on testing.

As agreed with @Jrmyy we will try to propose a solution that ignore totally the config, and check only the table type in order to do the right drop.