dbt-databricks: Generic DBT_ACCESS_TOKEN in dbt Workflow does not have access to catalogs

Context

  • AWS Cloud
  • Enterprise Plan
  • We are using Unity Catalog
  • An authenticated user without a permission group has access to no catalogs
  • Only authenticated users with a permission group can access respective catalogs and databases
  • I am a consultant engaged with a client
  • This client has regulatory requirements that prevent the feasibility of dbt cloud

Triage Steps

Following the instructions outlined for running dbt-core as a dbt Task Workflow here: https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-workflows.md

✅ Local Dev + SQL Warehouses works

I can get this to work locally when I specify the http_path to be a SQL Warehouse endpoint and I use my PAT injected into the profiles.yml This works because my PAT and the SQL Warehouses are assigned to the correct permissions groups in our workspace

😭 dbt Workflows does not work when…

This is because the cluster generates credentials generically for:

  • DBT_HOST,
  • DBT_ORG_ID,
  • DBT_CLUSTER_ID,
  • DBT_ACCESS_TOKEN

It seems that access token does not have the correct permission group associated and there is no way to add the association.

I tried hard coding a PAT that I know works but instead of using the /sql/1.0/endpoints/<endpoint_id> I tried it against the cluster endpoint "sql/protocolv1/o/{{ env_var('DBT_ORG_ID') }}/{{ env_var('DBT_CLUSTER_ID') }}".

This also did not work.

I could go back to having the workflow use the SQL Warehouse Endpoint and a PAT but that defeats the purpose of spinning up a spark cluster for a job only to send the work to another (more expensive) spark cluster right?

Triage Summary

Here is a summary table of test cases

token type endpoint type outcome
PAT SQL Warehouse
Generic Access Token Cluster Endpoint 😭
PAT Cluster Endpoint 😭

Definitions:

  • Generic Access Token --> DBT_ACCESS_TOKEN
  • SQL Warehouse --> /sql/1.0/endpoints/<endpoint_id>
  • Cluster Endpoint --> "sql/protocolv1/o/{{ env_var('DBT_ORG_ID') }}/{{ env_var('DBT_CLUSTER_ID') }}"

Further digging:

It seems the dbt-databricks team have left a TODO with a similar conundrum so we are all on the same page it seems.

https://github.com/databricks/dbt-databricks/blob/main/dbt/adapters/databricks/connections.py#L408-L420

                # TODO: what is the error when a user specifies a catalog they don't have access to
                conn: DatabricksSQLConnection = dbsql.connect(
                    server_hostname=creds.host,

Expected Behaviour

  • As a user I would expect documentation around how the generic DBT_ACCESS_TOKEN is generated.
  • As a user I would expect the dbt Workflow task to have some way to grant it permissions
  • As a user I would expect some documentation around the cluster SQL endpoints as well as a bread crumb trail of links from the related user guides and dbt task documentation

Extra

I do realise that my expectations lie with the Databricks team and not with this library package. However it also makes sense to track this issue in public so that others can follow along as it will undoubtedly link to some code changes too.

I have an active use case I am piloting with a client so very keen to act as a beta tester in our dev environment so there is low risk and we can provide real world feedback.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 3
  • Comments: 15 (7 by maintainers)

Most upvoted comments

You’ve set the standard for bug reports. Thanks @neozenith !!

Thanks for the awesome bug report @neozenith! The Workflows UC support is being looked into.