python-bigquery-pandas: Set project_id (and other settings) once for all subsequent queries so you don't have to pass every time

One frustrating thing is having to pass the project_id (among other parameters) every time you write a query. For example, personally, I usually use the same project_id, almost always query with standard sql, and usually turn off verbose. I have to pass those three with every read_gbq, typing which adds up.

Potential options include setting an environment variable and reading from these default settings, but sometimes it can be different each time and fiddling with environment variables feels unfriendly. My suggestion would perhaps be to add a class that can wrap read_gbq() and to_gbq() in a client object. You could set the project_id attribute and dialect and whatever else in the client object, then re-use the object every time you want a query with those settings.

A very naive implementation here in this branch: https://github.com/pydata/pandas-gbq/compare/master...jasonqng:client-object-class?expand=1

Usage would be like:

>>> import gbq
>>> client = gbq.Client(project_id='project-name',dialect='standard',verbose=False)
>>> client.read("select 1")
   f0_
0    1
>>> client.read("select 2")
   f0_
0    2
>>> client.verbose=True
>>> client.read("select 3")
Requesting query... ok.
Job ID: c7d7e4c0-883a-4e14-b35f-61c9fae0c08b
Query running...
Query done.
Processed: 0.0 B Billed: 0.0 B
Standard price: $0.00 USD

Retrieving results...
Got 1 rows.

Total time taken 1.66 s.
Finished at 2018-01-02 14:06:01.
   f0_
0    3

Does that seem like a reasonable solution to all this extra typing or is there another preferred way? If so, I can open up a PR with the above branch.

Thanks, my tired fingers thank you all!

@tswast @jreback @parthea @maxim-lian

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

The global pandas_gbq.context object (added in #208) fulfills the request in this issue. I think we can slowly add properties to that class over time. (My first targets are SQL dialect #195 and maybe location.)

A temporary solution to this issue using functools.partial in case folks want a stopgap workaround:

>>> from functools import partial
>>> from pandas.io import gbq
>>> bq = partial(gbq.read_gbq, project_id='xxxxx',verbose=False,dialect='standard')
>>> bq("select 1")
   f0_
0    1
>>> bq("select sum(a) from (select 5 a)")
   f0_
0    5

An intermediate step we could take is to make the project_id parameter optional when we are able to get it from the google-auth library. I believe @jasonqng originally had this logic in #25, but unfortunately I dropped that code when I updated that PR to use the 0.28 version of the google-cloud-bigquery library.

I just ran this in colab image