terraform-provider-google: bigquery dataset view access creates circular dependency
Community Note
- Please vote on this issue by adding a đ reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave â+1â or âme tooâ comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
- If an issue is assigned to the âmodular-magicianâ user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to âhashibotâ, a community member has claimed the issue already.
Terraform Version
Terraform v0.11.11 provider.google v1.19.1
Affected Resource(s)
google_bigquery_dataset google_bigquery_table
Terraform Configuration Files
resource "google_bigquery_dataset" "public" {
dataset_id = "public"
access {
role = "READER"
group_by_email = "${var.external_group_email}"
}
}
resource "google_bigquery_dataset" "private" {
dataset_id = "private"
// This is a circular dependency
// The private dataset requires access for public dataset views that require private dataset tables that require private dataset.
// It should be possible to do this separately e.g. using IAM Binding:
access {
view {
project_id = "${var.project}"
dataset_id = "public"
table_id = "a_view_that_selects_from_private_tables"
}
}
}
resource "google_bigquery_table" "real_table" {
dataset_id = "${google_bigquery_dataset.private.dataset_id}"
table_id = "real_table"
schema = "${file("schema_real_table.json")}"
}
resource "google_bigquery_table" "filtered_real_table" {
dataset_id = "${google_bigquery_dataset.public.dataset_id}"
table_id = "filtered_real_table"
view {
query = <<SQL
SELECT some columns
FROM `${var.project}`.public.real_table
WHERE user_email = SESSION_USER()
SQL
use_legacy_sql = "false"
}
// As a side note, it seems like adding an explicit dependency here is required because Terraform can't be expected to parse the BigQuery inline SQL view definition an reflect other bigquery table resources.
depends_on = ["google_bigquery_table.real_table"]
}
Debug Output
Error: Error applying plan: 2 error(s) occurred:
-
google_bigquery_table.filtered_real_table: googleapi: Error 404: Not found: Dataset project_example:private, notFound
-
google_bigquery_dataset.private: 1 error(s) occurred:
-
google_bigquery_dataset.private: googleapi: Error 400: View project_example:public.filtered_real_table to authorize not found., invalid
Panic Output
NA
Expected Behavior
Either the bigquery datasetâs âaccess entriesâ for view access should be deferred until after the datasets, tables and views are created, or we need an explicit new resource for adding view access to datasets, called something like google_bigquery_dataset_access_entries
Actual Behavior
The view canât be created because it depends on another dataset, That other dataset canât be created because it depends on the view
A circular dependency exists.
Steps to Reproduce
Run the example provided above.
Note it is incomplete, it needs a var.project and an existing GCP project, Iâm not defining projects in tf, although you could modify it to also create a project.
Itâs also using var.external_group_email to illustrate the use case, however, this access to the public dataset can just be deleted, itâs not related to the error.
Important Factoids
Is this a bug, or perhaps the docs are wrong and I canât use this for my use case? I think it is the former, based on what other GCP clients allow (see python example below)
In the documentation for google_bigquery_dataset it says:
(Optional) A view from a different dataset to grant access to. Queries executed against that view will have read access to tables in this dataset.
In the test defined for this, testAccBigQueryDatasetWithViewAccess it is not actually testing a view that references tables in the dataset. The test uses a view that appears to be referencing some other project: SELECT state FROM [lookerdata:cdc.project_tycho_reports]
Iâm not sure with the current setup that it is possible to create a dataset with view access for a view that âwill have read access to tables in this datasetâ, without creating a circular dependency, as I have experienced.
I think the access entries need to be applied separately.
In my example this would result in a dag with this dependency order
- private dataset
- private table(s)
- public dataset
- public view(s)
- private datasetâs access entries for the public view(s)
References
#2051 is a request to add IAM bindings and other IAM things for bigquery, which is somewhat related. View access to a dataset is not part of IAM though.
Test for the bigquery_dataset view access : https://github.com/terraform-providers/terraform-provider-google/blob/2.0.0/google/resource_bigquery_dataset_test.go
#644 original issue requesting this access control functionality
Functionality added in this PR: https://github.com/terraform-providers/terraform-provider-google/pull/1931
Workaround
I am using this Python script to add dependencies after the datasets, views and tables are created. Sorry itâs not golang:
In your tf file, create a data external resource , and make sure it has explicit depends_on defined for all the views you want access entries created for and also the dataset the views need access to.
-
This will apply permission to ALL views in the dataset passed in to the
query.publicattribute. If you want to only specify specific view permission youâll have to modify the script and allow a list to be passed in to thequeryparameter. -
If you define access blocks in the dataset (e.g. for roles or special groups) then subsequent runs of this workaround will lead to the resource definition for the dataset removing the view access which will then be re-added by the python script. i.e. it will cause changes to permissions even though technically nothing has changed. One potential workaround to that is use a
lifecycle { ignore_changes = ["access"] }block in the dataset resource and make it ignore changes to access. That ignore would need to be temporarily removed if you do want to make real access changes.
data "external" "private_dataset_view_access" {
program = ["python", "${path.module}/bq_private_dataset_access_entries.py"]
query = {
gcp_project = "${var.project}"
private = "${google_bigquery_dataset.private.dataset_id}" # the dataset_id that views required access to
public = "${google_bigquery_dataset.public.dataset_id}" # the dataset where all the views exist
}
depends_on = ["google_bigquery_dataset.private", "google_bigquery_table.filtered_real_table"]
}
bq_private_dataset_access_entries.py
#!/usr/bin/env python
"""
Grant permission to all the SQL views in a public Bigquery dataset to a private dataset.
e.g. all views in dataset `public` can access underlying tables that the views use,
even though the underlying tables exist in another dataset called `private`.
Note, Biquery calls these `datasets` where really they are different catalogs/databases.
Permissions will be overwritten!
See
https://cloud.google.com/bigquery/docs/access-control#overview
https://cloud.google.com/bigquery/docs/share-access-views
"""
import sys
import os
import json
from google.cloud import bigquery
# some global defaults for direct invocation of this script testing purposes
DEV_PROJECT = 'dev_gcp_project_name_here_for_testing'
PRIVATE_DATASET = 'private'
PUBLIC_DATASET = 'public'
current_dir = os.path.dirname(__file__)
# Parse the query from terraform
env = None
if len(sys.argv) > 1:
env = sys.argv[1]
if env not in ('--dev':
raise SyntaxError("cli args expected --dev")
private_dataset_id = PRIVATE_DATASET
public_dataset_id = PUBLIC_DATASET
if env == '--dev':
gcp_project = DEV_PROJECT
else:
query = json.load(sys.stdin)
# Required args from terraform
gcp_project = query['gcp_project']
private_dataset_id = query['private']
public_dataset_id = query['public']
client = bigquery.Client(project=gcp_project)
# Get the private dataset
private_dataset_ref = client.dataset(private_dataset_id)
private_dataset = bigquery.Dataset(private_dataset_ref)
private_dataset = client.get_dataset(private_dataset)
# Get the public dataset
public_dataset_ref = client.dataset(public_dataset_id)
public_dataset = bigquery.Dataset(public_dataset_ref)
public_dataset = client.get_dataset(public_dataset)
# Get all the public views
public_view_list_items = [view
for view in client.list_tables(public_dataset_ref)
if view.table_type == 'VIEW'
]
# Authorize all the views in the public dataset to access the private dataset
access_entries = private_dataset.access_entries
for public_view_item in public_view_list_items:
public_view_ref = public_dataset.table(public_view_item.table_id)
view = bigquery.Table(public_view_ref)
view = client.get_table(view)
access_entries.append(
bigquery.AccessEntry(None, 'view', view.reference.to_api_repr())
)
# Applying access entries is idempotent, de-duplication necessary for subsequent runs
deduped_access_entries = list()
seen_access_entries = set()
for access_entry in access_entries:
hashed_entry = hash(frozenset(access_entry.entity_id))
if hashed_entry not in seen_access_entries:
seen_access_entries.add(hashed_entry)
deduped_access_entries.append(access_entry)
private_dataset.access_entries = deduped_access_entries
private_dataset = client.update_dataset(
private_dataset, ['access_entries'])
# Output simplified result to Terraform.
dataset_json = private_dataset.to_api_repr()
return_value = dict()
return_value['projectId'] = dataset_json['datasetReference']['projectId']
return_value['datasetId'] = dataset_json['datasetReference']['datasetId']
return_value['access'] = json.dumps(dataset_json['access'])
json.dump(return_value, sys.stdout)
sys.stdout.write('\n')
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 21
- Comments: 16 (8 by maintainers)
Once it gets released (it should appear in version 3.17.0) itâll be usable as a
google_bigquery_dataset_accessresource. Hereâs a preview of what the docs page will look like: https://github.com/terraform-providers/terraform-provider-google/blob/master/website/docs/r/bigquery_dataset_access.html.markdownI think having it as a separate resource is going to be the cleanest solution to this. When Terraform is running, it creates the dependency graph and then creates/updates each resource individually in order. So, if you were to create everything without the view and then manually edit your config to add the view later, that should work just fine (as long as you arenât introducing a circular dependency via interpolation). However, I agree that users should be able to set up their infrastructure by running Terraform just once, so this should be fixed. Since this seems like something that can be worked around, I donât think we can realistically prioritize fixing it immediately, but if you open a PR for the fix someone will look at it.
Hi - just curious, is there an expected release date for 3.17.0? Thanks
That particular behavior is intended- one of Terraformâs features is detecting drift. So if you create the resource with Terraform, change it later, and then run Terraform again, itâll try to change it back to what youâve specified in your config.
The âoverwrite-onlyâ nature wonât be a problem - thatâs a common pattern in bigquery IAM. Added the help-wanted label to solicit PRs for this one. đ