google-cloud-python: BigQuery: insert_rows does not seem to work

Hello, I have this code snippet:

client = bigquery.Client(...)
table = client.get_table(
  self.client.dataset("Integration_tests").table("test")
)
print(table.schema)
rows = [
  {"doi": "test-{}".format(i), "subjects": ["something"]}
  for i in range(1000)
]
client.insert_rows(table, rows)

This produces the following output:

DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
DEBUG:google.auth.transport.requests:Making request: POST https://accounts.google.com/o/oauth2/token
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): accounts.google.com:443
DEBUG:urllib3.connectionpool:https://accounts.google.com:443 "POST /o/oauth2/token HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.googleapis.com:443
DEBUG:urllib3.connectionpool:https://www.googleapis.com:443 "GET /bigquery/v2/projects/{projectname}/datasets/Integration_tests/tables/test HTTP/1.1" 200 None
[SchemaField('doi', 'STRING', 'REQUIRED', None, ()), SchemaField('subjects', 'STRING', 'REPEATED', None, ())]
DEBUG:urllib3.connectionpool:https://www.googleapis.com:443 "POST /bigquery/v2/projects/{projectname}/datasets/Integration_tests/tables/test/insertAll HTTP/1.1" 200 None

It seems like it worked, but when I go to my table it’s empty. Any idea?

Python version: 3.6.0 Libraries version: google-cloud-bigquery==1.1.0 google-cloud-core==0.28.1

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 32 (10 by maintainers)

Most upvoted comments

Okay, I think I might have found a solution.

In the “Streaming into ingestion-time partitioned tables” section on this page there is the suggestion that the partition can be explicitly specified with the syntax mydataset.table$20170301.
If I do this (so replace table_ref = dataset_ref.table('payload_logs') with dataset_ref.table('payload_logs$20190913') in the code above), then it works, and the result is immediately returned by the queries.

This is a bit surprising to me, because if I don’t specify the partitiontime explicitly, then I’d expect BigQuery to simply take the current UTC date, which seems to be identical to what I’m doing when I’m specifying it in code.
Anyhow, this seems to solve the issue.

markvincze on Sep 13, 2019

I had the same problem. I got around it by using jobs to push data instead of client.insert_rows

Like this:

table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.autodetect = False

job = client.load_table_from_file(io.StringIO(data), table_ref, job_config=job_config)
job.result()  # Waits for table load to complete.
print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))

Reference: https://cloud.google.com/bigquery/docs/loading-data-local

srinidhi-shankar on Aug 6, 2019

@shollyman Thanks. Yes, in my script I delete and create table, then insert data into the table. And, I just tried to use a new table id and insert 100 rows, right after the insert finishes and use SELECT to query, only 1 row appears. After a while I did the query again, the 100 rows are returned. So it is expected that the new insert will Unavailable for some time? How long will that time be?

zhudaxi on Aug 5, 2019