site-kit-wp: Retry failed requests: implement exponential backoff

Feature Description

Currently when the plugin makes an API request that results in an error we display the error to the user and stop processing. We can improve this behavior for specific errors by implementing exponential backoff. We will retry the request in (roughly) 1, 2 and 4 seconds.

Let’s start by turning on the client’s default retry behavior (which handles 50x errors and curl/connection errors) and set maximum retries to 3. We can then follow up for other API specific codes that should be retried. Exact conditions/codes/reasons for sending a request will vary by API.

For example, for the Analytics Reporting API, the following error codes should use exponential retry:

429 - Quota Error: Number of recent failed reporting API requests is too high, please implement exponential back off.

For the Analytics Management API, when the error message is “Quota Error: Rate limit for writes exceeded.” we should use exponential backoff to retry.


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • When the PHP client encounters a retry-able errors for API requests, it should use exponential backoff to re-request that data, trying up to 3 times.

Implementation Brief

  • When Google API request are made, the client should handle error conditions with up to 3 retries:

  • In includes/Core/Authentication/Clients/OAuth_Client.php:setup_client configure the $client to add support for retries:

// Enable exponential retries, try up to three times.
$client->setConfig( 'retry', array( 'retries' => 3 ) );

Test Coverage

We can test that the default client uses exponential backoff. We shouldn’t need tests for actual retries - the client itself includes these tests.

  • In tests/phpunit/integration/Core/Authentication/Clients/OAuth_ClientTest.php expand the test_get_client test which calls Google_Site_Kit_Client:get_client. Add assertions that verify retries is set to 3:
$retry = $client->getConfig( 'retry' );
assertEquals( $retry[ 'retries' ], 3 );

Visual Regression Changes

  • No changes in VRT.

QA Brief

  • To test that this is working as expected we need to be able to mock API error responses. I will work separately to add this capability in the tester plugin.

Changelog entry

  • Implement exponential backoff to retry Google service API requests a limited amount of time if they fail with temporary errors.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 31 (1 by maintainers)

Most upvoted comments

@adamsilverstein @aaemnnosttv It’s a bit late now to make more changes here like move this behind a feature flag, and I’m also not sure that makes sense since the implementation for the feature is completed - I think we’re okay launching this, as only wide usage will give us an idea about the impact on API requests anyway. Once this goes out on Monday, let’s monitor whether it has any impact on API requests or quotas and react if needed.

@adamsilverstein ready for another pass 👍

@aaemnnosttv follow up PR created and linked.

Hey @aaemnnosttv + @adamsilverstein, I’ve added a custom $retry_map and pass it to the Google_Client as a config that should now be passed to the REST Runner during the execute function in the library.

Updated here: https://github.com/google/site-kit-wp/pull/2442/files

@adamsilverstein after taking a quick look, it seems that we just need to identify the reason and set a custom $retry_map in the config (it doesn’t seem to allow for extension) that has {reasonNotToRetry} => Google_Task_Runner::TASK_RETRY_NEVER.

That reason appears to apply to any status code though so that’s something to consider.

@felixarntz @aaemnnosttv - I suggest we delay implementation of this ticket by at least one release so we can collect API error data and observe how this change impacts those errors/rates.

This may be as simple as $analytics_client = clone $client before making changes but I’m not sure.

Yea, I was hoping it might be. Still, I can see the value in applying this globally.

I think configuring this globally should be safe to do for 5xx errors – afterall, these are only triggered by Google_Service_Exceptions, correct?

You are probably right, I am likely being overly cautious here. My main concern is that API responses are not consistent across the APIs, for example how they report Quota errors, which can be 429 or 403 errors. Still, the 500x class errors and also retrying on curl/connection errors makes sense - which is probably why they are defaults.

I will update the IB to apply this globally to the client. The plan is to land the error monitoring first and collect a couple weeks of data before launching this change. That way we can gauge the impact of the change on error rates.