google-cloud-python: BigQuery: DEFAULT_RETRY doesn't catch rate 'rateLimitExceeded'.

OS: MacOS 10.14.3 Python: 3.7.3 google-cloud-bigquery: 1.15.0

I’m trying to reproduce the condition where BigQuery is executing 100 concurrent queries and refuses to execute other jobs by raising Forbidden: 403 Exceeded rate limits: too many concurrent queries for this project_and_region.

Let’s say that my BigQuery instance is already executing 100 queries and I request a new job execution using client.query(). It seems to me that the DEFAULT_RETRY has been designed to retry the operation if the rateLimitExceeded error is received, but it doesn’t seem to work. What happens is that client.query() returns a job with the error already set and job.result() finally throws the error.

What I’d like to happen is that client.query() understands that there are concurrent queries and tries to execute later the query, according to the retry object.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 5
  • Comments: 17 (8 by maintainers)

Most upvoted comments

I see the problem now: the retry logic gets triggered only if an exception gets raised; this exception might get raised in the api_request method defined in the core library (here).

Unfortunately, even when the rate limit is exceeded, the api response returns status_code = 200 !! Therefore no exception is raised and the retry logic is not triggered.

An API that returns status_code = 200 and a 403 error as a string in the json response seems like a bad designed API to me… do you have any control over BigQuery APIs?

@ralbertazzi Thank you for the code to reproduce.

Unfortunately, this is expected behavior. The rateLimitExceeded error you are encountering is due to a limit on the number of concurrent INTERACTIVE priority queries. In the backend, the job is successfully created, but then fails. This means we can’t retry the job in the same way we retry HTTP error codes.

To work around this restriction, it’s recommended that you run your queries with BATCH priority, instead.

jobs = []
for i in range(200):
    job_config = bigquery.QueryJobConfig(priority="BATCH")
    job = client.query("""
    CREATE TEMP FUNCTION is_prime_slow(x INT64)
    RETURNS BOOL
    LANGUAGE js AS '''
      for(var i = 3; i < x; i++)
        if(x % i === 0) return false;
      return true;
    ''';

    SELECT num
    FROM UNNEST(GENERATE_ARRAY(1000000, 2000000)) AS num
    WHERE is_prime_slow(NUM)
    """, job_config=job_config)
    jobs.append(job)

error_jobs = [job for job in jobs if job.error_result]
print("Tried {} jobs, got {} errors.".format(len(jobs), len(error_jobs)))

Just to be sure I tried with 1 hour deadline:

my_retry = DEFAULT_RETRY.with_deadline(3600)
job = client.query("SELECT 1", retry=my_retry)
assert len(job.errors) == 1
assert job.errors[0]["reason"] == "rateLimitExceeded"
job.result()

As I already said in the previous post, the client.query() call returns in 1-2 seconds (it doesn’t wait for 1 hour) and the returned job contains the error. Calling job.result() finally raises the Forbidden error.

I tried also putting a breakpoint in the _should_retry func defined here and it never gets triggered. So I think that the problem is not a wrong retry logic but the retrial logic not getting called.