dbt-core: Can't access BigQuery tables whose original sources are spreadsheets on Google Drive with the oauth method
Describe the bug
It is impossible to access BigQuery tables whose original sources are spreadsheets on Google Drive with the oauth method, because we don’t pass any scopes parameters to the method to create BigQuery credentials. We need a scope of https://www.googleapis.com/auth/drive to access such tables. Moreover, there is no way to grant scopes, not permissions, to a google cloud authentication. We can only pass the scope information when creating a credentials.
Steps To Reproduce
- Create a BigQuery table whose data source is a spreadsheet.
- Set a profile with the
oauthmethod for BigQuery. - Create a dbt model which refers to the BigQuery table with the profile.
Expected behavior
Even when we use the oauth method of BigQuery profile, we should be able to access BigQuery tables whose original data sources are spreadsheets on Google Drive with the scope of https://www.googleapis.com/auth/drive.
Screenshots and log output
$ dbt run
...
Database Error in model my_model (models/my_model.sql)
Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
compiled SQL at target/run/my_model.sql
System information
Which database are you using dbt with?
- postgres
- redshift
- bigquery
- snowflake
- other (specify: ____________)
The output of dbt --version:
installed version: 0.19.0
latest version: 0.19.0
Up to date!
Plugins:
- bigquery: 0.19.0
The operating system you’re using: Mac OS 10.15.7 (19H114)
The output of python --version:
Python 3.8.5
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 14
- Comments: 22 (15 by maintainers)
Okay, I think I found the fix. It’s necessary to run this command to add the Drive scope to the application default credentials:
(Note that you also have to add the bigquery scope and anything else needed by dbt).
Okay I’m back, and I have good news. After digging through here and trying out some of the changes I think I understand how this happened and how we can resolve the issue.
To start the issue was first introduced in PR #2908 where this line which passed scopes when using OAuth was replaced with this line which relies on the defaults configured with
gcloud.This is why even after this change if you authenticate with the scopes added (using
gcloud auth application-default login --scopes=https://www.googleapis.com/auth/bigquery, https://www.googleapis.com/auth/drive.readonlyyou will not see the failure since your default credentials have the scopes added.However, if you have a setup like we do over at Honey 😉 you will see the problem because we use a service account in an environment with no default credentials available and authenticate by activating the service account with
gcloud auth activate-service-accountwhich does not allow for passing scopes, but will cause the dbt auth code to still go down that (OAuth) path. This is a fairly standard auth flow for GCP (and is a common approach for production authentication with service accounts)Looking back now this is actually not a mentioned path on the BQ profile setup page (can’t remember if it was back a ~year ago when we set this up) which may be why this path got hit with the bug.
Tested our setup with #3041 and it does resolve the issue (manually adding the 4 lines of changes causes a previous failure to then work in the same environment back-to-back).
I think we should merge that change, and consider updating the docs to cover the
activate-service-accountauthentication path.Let me know if you have any questions. I also support making these optional in some form for those who don’t need them.
As another data point, I reproduced the same error by directly running the
bqcommand against an external BigQuery table, but was then able to fix this error by running this command:gcloud auth login --enable-gdrive-accessHowever, even after running that gcloud command, dbt is still giving me the same error. The BigQuery documentation does suggest that this is an issue with not providing the appropriate scope, so I don’t know why it’s still failing even after patching in PR #3041.
https://cloud.google.com/bigquery/external-data-drive#python
Looking at that page for guidance, I’m able to reproduce the Drive error with just a small python snippet:
So this issue seems to me like it’s an issue with the BigQuery client code itself. Still digging…
Just tried it out. Can confirm that the
v0.19.1rc2release resolves the issue for our setup. 🙌Just in case this helps anyone who may be trying to use the OAuth path within dbt from a GCP context: even without this regression (i.e. the credentials you create are requesting https://www.googleapis.com/auth/drive scope, as in 0.18 and 0.19.1), if the scopes you enable on your cluster (ex. GKE) are a subset of https://www.googleapis.com/auth/drive like https://www.googleapis.com/auth/spreadsheets.readonly, and even if you only practically use that subset, the request made by dbt may fail with the error described in this bug (
Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.).I didn’t dig much further than to expand scope on GKE from
https://www.googleapis.com/auth/spreadsheets.readonly->https://www.googleapis.com/auth/driveand get this succeeding on 0.18, so maybe other configurations may work, but I’m doubtful that anything less than enabling the specific scope you’re requesting will allow a request to succeed.I think dbt should consider making
scopesa configurable profile / project config option for BigQuery. If you went down the localgcloudpath and set your default scopes to includehttps://www.googleapis.com/auth/drive, etc., you wouldn’t need to set this for local runs, so perhaps it could default to just bigquery. If you were running somewhere remotely, as a service account, you can’t set default scopes, so you may need to set this option, but making it configurable would mean you could specify something likehttps://www.googleapis.com/auth/spreadsheets.readonly, if that was all you needed. If you were running in that context and your request failed with insufficient scopes, definitely a pointer to the dbt docs which describe the config option would be extremely helpful.@walshie4 Heroic! Thanks for the detailed investigative work.
It all becomes clear. If using a service account to authenticate via the
gcloud-styleoauthmethod, then the scopes passed by dbt here go unused, and no scopes were specified bygcloud auth, and so voila, scopes are missing.Given that, I’m happy merging #3041 and sneaking this into v0.19.1. We’ll likely want to cut a second release candidate, and ask you folks to test it out.
I’d also welcome a PR here that updates “Local OAuth gcloud setup” to include mention of Service Account OAuth gcloud setup as well.
Can also confirm that this works once you re-authenticate with the required scopes! 🥳
@walshie4 We just merged the fix in #3041 and included it in v0.19.1rc2. When you get a chance, could you try installing the release candidate and authenticating via the magical service account /
gcloudoauth combo?Finally got around to testing #3041 and it appears to not solve the problem in my local reproduction setup. Added a print in here to let me know what scopes are being passed. Still errors out:
After running
This fixes the problem with both version
0.19and the #3041 branch (which makes me question if #3041 is actually changing anything).I’m still looking into why this isn’t working when running as a service account which has been given access to the document and BQ. Will report back if I find anything of value
I’m not optimistic that #3041 will actually change anything, since the OAUTH method is just using the scopes that were passed into the
gcloud auth application-defaultcommand. I support adding better error messages as outlined in option 2 above.To summarize from the conversation over in #2953:
service-account,service-account-json, oroauth-secrets, dbt will use the scopes defined here.oauth-based connection method (i.e. thegcloudSDK), dbt will not use those scopes, but instead rely on the scopes defined ingcloud auth application-default.Given the confusion this issue stirred up inn the meantime, there’s definitely appetite to clarify this behavior, by better documenting it or making it more intuitive. Right now, the Drive scope is “optional” for users who authenticate and develop locally via
gcloud/oauth. The original intent was to make it so folks didn’t have to give dbt access to all their personal Drive files if they aren’t actually using Drive as a data source. (From the docs: “If your dbt project does not transform data in Google Sheets, then you may omit the--scopesflag.”) Since it’s likely thatgcloud/oauthis based on a personal email address, this requires giving Drive access to a lot more than just GSheet data sources! That feels different from authentication via a dedicated service account, where it’s feasible to strictly control access to only the Drive files needed for a specific purpose, so dbt can bake in the Drive scope request every time.Is making that distinction still desirable? I think there are two valid answers:
get_bigquery_defaults()method should try to grab Drive scopes regardless (= what #3041 attempts).Permission denied while getting Drive credentials. If a user is connecting via theoauthmethod, the error message should be wrapped in a bit of context and link out to the relevant docs.Let’s use this issue to discuss which of these two options makes sense, and plan for a code change that will help us avoid this confusion in the future.
Hey @walshie4! Could you try installing the fix from #3041 in a virtualenv, and see if that removes the error? Something like:
I haven’t yet been able to replicate this error myself, so hearing that the proposed fix resolves the issue is a good indication that it’s the right change.
We’re using BigQuery with dbt version 0.18.1.
We’re getting that same Database Error:
Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.@jtcohen6 we installed that patch branch and got the exact same error. We couldn’t locate anything in the logs either.