google-cloud-python: BigQuery: Error 413 (Request Entity Too Large)!!1
Python 3 google-cloud-bigquery==0.29.0
To reproduce: Function:
def get_result_dataframe(query):
query_job = client.query(query)
return query_job.to_dataframe()
Query: ‘SELECT * FROM project.dataset.table_with_lots_of_columns LIMIT 1000’
Result: Error 413 (Request Entity Too Large)!!1
More data from log: Seems like it queries the table first to get all the possible columns in the table and then makes an API call to the actual table with every column possible in the query’s select statement. The query size then becomes too large.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 24 (21 by maintainers)
Commits related to this issue
- Update job.py fix for #4864 — committed to VikramTiwari/google-cloud-python by VikramTiwari 6 years ago
- BigQuery: Don't use `selected_fields` for listing query result rows (#5072) * Update job.py fix for #4864 * Adds system tests for listing rows from table with many columns Order of rows in t... — committed to googleapis/google-cloud-python by VikramTiwari 6 years ago
@shivafractal Rather than passing an 800,000 item list as a query parameter, I would consider uploading those values into a separate (maybe temporary?) table and then building the query by joining with it.
Haven’t tested but let me know if that works. @tswast @bmabey
Yeah, 0.27 and before got query results using https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults, which is slower but also easier to get right in the client.
I’ll give the backend folks a little time to see if they can increase the request limit size. If not, we do have a workaround, which is to get the destination table object and then list rows (or we can pretend we have a full Table object by populating the reference and the schema).
Okay, I am able to reproduce if I pass the schema in as
selected_fields, which is whatquery()does behind the scenes.System test:
With 10,000 columns, the request to list rows (tabledata) doesn’t even finish sending. The connection breaks before it can get sent.
requests.exceptions.ConnectionError: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe'))With 1,000 columns the request goes through but fails with the described error 413.
Error 413 (Request Entity Too Large)!!1I don’t see a request size limit documented at https://cloud.google.com/bigquery/quotas#api_requests so I will file an issue internally about this.
The workaround we could possibly do in the client libraries is to not populate the
selected_fieldsand pretend we have a completeTableobject when fetching query results, but I worry this will not do the right thing if the query is written to an existing empty/non-empty destination table where the columns are not in the same order as the query.