amplify-js: Clock Drift causing client to fire off infinite number of uncatchable 4xx errors to set up GraphQL

Before opening, please confirm:

JavaScript Framework

React

Amplify APIs

Authentication, GraphQL API

Amplify Categories

auth, api

Environment information

# Put output below this line


  System:
    OS: macOS 13.6
    CPU: (10) arm64 Apple M1 Max
    Memory: 2.52 GB / 32.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 18.14.0 - ~/.nvm/versions/node/v18.14.0/bin/node
    npm: 9.5.0 - ~/.nvm/versions/node/v18.14.0/bin/npm
  Browsers:
    Chrome: 118.0.5993.117
    Safari: 16.6
  npmPackages:
    @osu/footer: ^2.1.0 => 2.0.0-upgrade.0 
    @osu/navbar: 2.0.0 => 1.2.1-beta.0 
    @osu/react-ui: ^2.2.0 => 2.1.1-beta.0 
    @osu/wm-react-analytics: ^1.1.0 => 1.1.0 
    @reduxjs/toolkit: ^1.9.5 => 1.9.5 
    @reduxjs/toolkit-query:  1.0.0 
    @reduxjs/toolkit-query-react:  1.0.0 
    @testing-library/jest-dom: ^5.16.5 => 5.16.5 
    @testing-library/react: ^13.4.0 => 13.4.0 
    @testing-library/user-event: ^13.5.0 => 13.5.0 
    aws-amplify: 5.3.3 => 5.3.10 
    axios: ^1.3.4 => 1.3.4 (0.26.0)
    js-search: ^2.0.1 => 2.0.1 
    react: ^18.2.0 => 18.2.0 
    react-dom: ^18.2.0 => 18.2.0 
    react-idle-timer: ^5.5.2 => 5.5.2 (5.7.2)
    react-redux: ^8.0.5 => 8.0.5 
    react-router-dom: ^6.8.1 => 6.8.1 
    react-scripts: 5.0.1 => 5.0.1 
    redux: ^4.2.1 => 4.2.1 
    redux-logger: ^3.0.6 => 3.0.6 
    redux-thunk: ^2.4.2 => 2.4.2 
    sass: ^1.58.1 => 1.68.0 
    web-vitals: ^2.1.4 => 2.1.4 
  npmGlobalPackages:
    @aws-amplify/cli: 11.1.1
    @osu/react-ui: 2.2.0-beta.0
    corepack: 0.15.3
    npm: 9.5.0

Describe the bug

We have an Amplify project that allows for auth via user pool and through public identity pool. We are experiencing a high error rate of 4xx errors. We are constantly in an error state. While I have seen a number of tickets related to clock drift, I have not seen one call out an infinite number of calls to the backend. While debugging on the very latest, I set a debug flag on the supposed fixes for this issue and it just re runs that code over and over without stopping. If you do not close the browser, it never stops firing leading to 50,000 client calls and up. We’ve had to add a rate limiter just to try and help manage this flood of errors.

We were on the very latest of Amplify but downgraded to a version suggested in another issue that slightly helped (5.3.3), but not entirely. The image below shows some of what we have been dealing with, the spike was when we were on the newest Amplify version.

Expected behavior

A client should not be able to fire off tens to hundreds of thousands of times. There should be a controllable number of retries. Once that specific 4xx error is identified, a refreshed, valid signature is created.

Reproduction steps

On an amplify project with graphql and public identity pool, set your clock ahead or behind ten minutes.

Code Snippet

// Put your code below this line.

Log output

X-ray:

  "cause": 
{
                    "message": "Signature not yet current: 20231027T154000Z is still later than 20231027T140444Z (20231027T135944Z + 5 min.)"
 }

Log insights:

Response Headers: {x-amzn-ErrorType=BadRequestException}


aws-exports.js

No response

Manual configuration

No response

Additional configuration

No response

Mobile Device

No response

Mobile Operating System

No response

Mobile Browser

No response

Mobile Browser Version

No response

Additional information and screenshots

One week of errors: Screenshot 2023-10-27 at 10 00 41 AM

Zoom in on latest Amplify: Screenshot 2023-10-27 at 10 13 02 AM

You can see a near mirror image between errors and our calls. We also never should have had that many calls, its just the client being able to fire off an uncontrollable amount of 4xx errors.

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Reactions: 1
  • Comments: 25 (13 by maintainers)

Most upvoted comments

Hi @gracektay

I just confirmed this is indeed a bug with our clock drift correction function. Here’s where it goes wrong:

  1. When server clock is 19:00:00 but the the client clock is 18:00:00, a request is initiated and failed with BadRequest error
  2. Error triggers the drift correction and client offset the clock by 1 hour. The retried request has Date header at 19:00:00 (client clock + offset)
  3. When the client clock is back to correct 19:00:00, the client offset is still 1 hour, the consequent request has Date header at 20:00:00 with the offset.
  4. The request failed with BadRequest error and trigger the drift correction. The function compares the request date at 20:00:00 and the server clock 19:00:00, and set the offset to -1 hour, then initiate a retry request.
  5. The retry request has local date of 19:00:00 and offset -1 hour, so the request Date header is 18:00:00 and hence failed with BadRequest. At this time the drift correction detect and set the offset to 1 hour again and trigger the retry. Then trigger the error in Step 3.
  6. As result, the request is in infinite retry because the request Date is flipped back and forth between 18:00:00 and 20:00:00 whereas the server clock is 19:00:00.

We will implement a fix for this issue and let you know when it’s ready.

@gracektay Hi, a fix has been released in aws-amplify@5.3.12. Can you give a try with this version?

It’d be a bit difficult to share the entire code set up but that’s the component and this is the base API call:


import { API, Auth } from "aws-amplify";
import { getSessionToken } from "../../util/functions";
import { AUTH_MODE } from "../../api-config";
import { isAfter } from 'date-fns'
export async function graphqlBaseQuery(args) {
  const { query, variables } = args;
  let authMode
  const sessionToken = await getSessionToken().catch()
  const { identityId, expiration } = await Auth.currentCredentials().catch() ?? {}
  const expirationDate = new Date(expiration)
  const nowDate = new Date()
  const isExpired = isAfter(nowDate, expirationDate)
  if(sessionToken) {
    authMode = AUTH_MODE.AMAZON_COGNITO_USER_POOLS
  } else if(identityId && !isExpired) {
    authMode = AUTH_MODE.AWS_IAM
  }
  let response = {}  
  try {
    if(!authMode) {
      return {
        error: {
          status: 500,
          data: 'Unable to authenticate credentials'
        }
      }
    }
    const result = await API.graphql({ authMode, query, variables });
    if (result?.data) {
      response = result;
    } else {
      throw new Error("Unexpected data format");
    }
  } catch (error) {
    response = { error: { status: 500, data: (error?.errors ?? error) } };
    if(error?.data) {
      response.data = error.data
    }
    return {
      error
    }
  }
  return response;
}

the expired bit was an attempt to fix the issue, but it made no difference.

Reposting these docs in case it’s helpful to understand the polling: https://redux-toolkit.js.org/rtk-query/usage/polling

The polling just offers more opportunities for the infinite failure to occur.

@ashwinkumar6 to an extent that’s true, the minute repeated successful calls are fine. But the failures with clock drift also cause repeated calls, an infinite number of them until the browser is force closed.

@gracektay, have you also enabled CORS and checked the IAM permissions for the public identity pool? We’re having trouble reproducing the issue with GraphQL (but have reproduced it using Pinpoint similar to #9281), and trying to see if we can get the same recurring error you see.

If you have a minimal sample app (or even what the frontend code looks like), that would be helpful too!

There are several previously closed issues related to this. In particular, this issue was resolved by both a library fix, and a fix to the the CLI. @gracektay - can you also share what version of the CLI you are using? Thank you!

@gracektay, thank you for the additional information and pics! We are currently investigating this and will update this issue as soon as we can.

@AllanZhengYP Hi! I’ve updated https://d3rue5wy9qeptt.cloudfront.net. I have tested in Chrome and Safari, so far so good. I see it error once, retry and succeed. I’ll let you know if I see more issues! Thanks SO much!

This only impacts a couple of users (its a handful of people responsible for 50k requests) and for the most part everything is working as expected (except for when we switch back and forth on time) so my gut feeling is that it is not related to CORS or our permissions. If it was then I’d assume it’d be broken for the hundred thousand other users.

Here are some snippets from our IAM permissions in case its useful (with certain values removed just in case)

Trust relationship:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "cognito-identity.amazonaws.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "cognito-identity.amazonaws.com:aud": "us-east-2:[[identifier-removed]]"
                },
                "ForAnyValue:StringLike": {
                    "cognito-identity.amazonaws.com:amr": "unauthenticated"
                }
            }
        }
    ]
}
permissions:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "mobileanalytics:PutEvents",
                "cognito-sync:*",
                "cognito-identity:*"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "appsync:GraphQL"
            ],
            "Resource": [
                "arn:aws:appsync:us-east-2:[[accountid]]:apis/[[apiid]]/types/*/fields/*",
                "arn:aws:appsync:us-east-2:[[accountid]]:apis/[[apiid]]"
            ],
            "Effect": "Allow"
        }
    ]
}

Also, if its helpful to know, we had originally been unsure of the cause and my team had a case opened, CASE 14087351881, forwarded to the AppSync team.

@cwomack Hi! here are lots of pictures showing the issue from the client side. Hope this helps! Client side:

{
  "errors" : [ {
    "errorType" : "BadRequestException",
    "message" : "Signature not yet current: 20231027T153914Z is still later than 20231027T153419Z (20231027T152919Z + 5 min.)"
  } ]
}

Here are some pictures: Screenshot 2023-10-27 at 11 33 54 AM Screenshot 2023-10-27 at 11 33 46 AM Screenshot 2023-10-27 at 11 34 11 AM Screenshot 2023-10-27 at 11 34 02 AM

our call:

import { API } from "aws-amplify"; 
API.graphql({ authMode, query, variables });