airflow: Redshift connection breaking change with IAM authentication

Apache Airflow version

2.6.1

What happened

This PR introduced the get_iam_token method in redshift_sql.py. This is the breaking change as introduces the check for iam in extras, and it’s set to False by default.

Error log:


self = <airflow.providers.amazon.aws.hooks.redshift_sql.RedshiftSQLHook object at 0x7f29f7c208e0>
conn = redshift_default

    def get_iam_token(self, conn: Connection) -> tuple[str, str, int]:
        """
        Uses AWSHook to retrieve a temporary ***word to connect to Redshift.
        Port is required. If none is provided, default is used for each service
        """
        port = conn.port or 5439
        # Pull the custer-identifier from the beginning of the Redshift URL
        # ex. my-cluster.ccdre4hpd39h.us-east-1.redshift.amazonaws.com returns my-cluster
>       cluster_identifier = conn.extra_dejson.get("cluster_identifier", conn.host.split(".")[0])
E       AttributeError: 'NoneType' object has no attribute 'split'

.nox/test-3-8-airflow-2-6-0/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/redshift_sql.py:107: AttributeError

What you think should happen instead

It should have backward compatibility

How to reproduce

Run an example DAG for redshift with the AWS IAM profile given at hook initialization to retrieve a temporary password to connect to Amazon Redshift.

Operating System

mac-os

Versions of Apache Airflow Providers

No response

Deployment

Astronomer

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (14 by maintainers)

Most upvoted comments

Indeed, conn.host.split(".")[0] is compiled regardless the extra values. I will fix it.

@dstandish @hussein-awala @phanikumv This is a bug in the current implementation as per the PR.

When user passes the following as part of redshift connection, it would break:

{
  "iam": true,
  "cluster_identifier": "redshift-cluster-1",
  "port": 5439,
  "region": "us-east-1",
  "db_user": "awsuser",
  "database": "dev",
  "profile": "default"
}

The following image shows that the host is not set in the connection but cluster_identifier is set. Screenshot 2023-05-26 at 3 56 58 PM

Even though the cluster_identifier is set here this would break with the following error on the line as host is not set because conn.host is evaluated first and when we split it to NoneType object it would throw attribute error.

conn.extra_dejson.get("cluster_identifier", conn.host.split(".")[0])

Screenshot 2023-05-26 at 3 58 00 PM

Following is the test we did.

>>> conn.extra_dejson.get("cluster_identifier")
PyDev console: starting.
'<REDSHIFT_CLUSTER_IDENTIFIER>'
>>> conn.extra_dejson.get("cluster_identifier", None)
'<REDSHIFT_CLUSTER_IDENTIFIER>'
>>> conn.host

>>> conn.extra_dejson.get("cluster_identifier", conn.host.split(".")[0])
Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'split'

Thanks @pankajkoti for your support with the connection details we use in astro-sdk and debugging.

Conclusion:

This is a bug in the current implementation for all users when host is not set in the connection (not only for default connections). This would require an null check for conn.host.

that’s interesting, and, maybe not the best choice (particularly given that it looks like it would blow up, since host is not populated), but it’s just an example connection. and, looks like it was probably released at same time as this [new] feature.

don’t get me wrong, i think we should make the code better, but just doesn’t seem, strictly speaking, to be a breaking change – whatever that’s worth.