nodejs-firestore: Error: 14 UNAVAILABLE: 403:Forbidden

Describe your environment

Operating System version: Ubuntu 20.04.1 LTS
Firebase SDK version: 11.4.1
Firebase Product: Firebase Admin
Node.js version: 18.13.0
NPM version: 8.19.3 (Used as a subdependency)
@google-cloud/firestore version: 6.4.2

Describe the problem

Steps to reproduce:

Our NodeJS backend, running ExpressJS and using Firebase Admin SDK has been running smoothly for 2 years. All of a sudden, on the morning of the 17th of Jan, our service was interupted by a seemignly routine Firebase call made by our baseModel. This causes all requests to fail until we reboot the backend. Since then, the error has happened randomly every 2-6+ hours with thousands of OK requests in between. The error seems to stem from the Firebase Admin SDK package/its depencies.

Relevant Code:

Error Log:

Jan 17 10:02:08 Error: 14 UNAVAILABLE: 403:Forbidden
Jan 17 10:02:08 at Object.callErrorFromStatus (/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
Jan 17 10:02:08 at Object.onReceiveStatus (/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client.js:352:49)
Jan 17 10:02:08 at Object.onReceiveStatus (/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181)
Jan 17 10:02:08 at /app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/call-stream.js:188:78
Jan 17 10:02:08 at processTicksAndRejections (internal/process/task_queues.js:82:9)
Jan 17 10:02:08 for call at
Jan 17 10:02:08 at ServiceClientImpl.makeServerStreamRequest (/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client.js:336:30)
Jan 17 10:02:08 at ServiceClientImpl.<anonymous> (/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
Jan 17 10:02:08 at /app/node_modules/@google-cloud/firestore/build/src/v1/firestore_client.js:205:29
Jan 17 10:02:08 at /app/node_modules/google-gax/build/src/streamingCalls/streamingApiCaller.js:38:28
Jan 17 10:02:08 at /app/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16
Jan 17 10:02:08 at Object.request (/app/node_modules/google-gax/build/src/streamingCalls/streaming.js:126:40)
Jan 17 10:02:08 at Timeout.makeRequest [as _onTimeout] (/app/node_modules/retry-request/index.js:139:28)
Jan 17 10:02:08 at listOnTimeout (internal/timers.js:531:17)
Jan 17 10:02:08 at processTimers (internal/timers.js:475:7)
Jan 17 10:02:08 Caused by: Error
Jan 17 10:02:08 at Firestore.getAll (/app/node_modules/@google-cloud/firestore/build/src/index.js:902:23)
Jan 17 10:02:08 at DocumentReference.get (/app/node_modules/@google-cloud/firestore/build/src/reference.js:211:32)
Jan 17 10:02:08 at Function.<anonymous> (/app/dist/models/baseModel.js:278:62)
Jan 17 10:02:08 at Generator.next (<anonymous>)
Jan 17 10:02:08 at /app/dist/models/baseModel.js:31:71
Jan 17 10:02:08 at new Promise (<anonymous>)
Jan 17 10:02:08 at __awaiter (/app/dist/models/baseModel.js:27:12)
Jan 17 10:02:08 at Function.findById (/app/dist/models/baseModel.js:274:16)
Jan 17 10:02:08 at /app/dist/routes/routeUtils.js:62:42
Jan 17 10:02:08 at Generator.next (<anonymous>) {
Jan 17 10:02:08 code: 14,
Jan 17 10:02:08 details: '403:Forbidden',
Jan 17 10:02:08 metadata: Metadata {
Jan 17 10:02:08 internalRepr: Map {
Jan 17 10:02:08 'content-type' => [Array],
Jan 17 10:02:08 'content-length' => [Array],
Jan 17 10:02:08 'date' => [Array],
Jan 17 10:02:08 'alt-svc' => [Array]
Jan 17 10:02:08 },
Jan 17 10:02:08 options: {}
Jan 17 10:02:08 }
Jan 17 10:02:08 }

Trickled down baseModel line from our code that does the Firebase fetch:

   static findById(id, showDeleted = false) {
        return __awaiter(this, void 0, void 0, function* () {
            try {
                if (!id)
                    throw new Error(`Invalid id provided for model ${this.name}`);
                const result = yield this.collection.doc(id).get();
                if (!result.exists)
                    return null;
                if (!showDeleted && result.data().isDeleted)
                    return null;
                return new this(Object.assign({}, result.data()));
            }
            catch (e) {
                console.error(e);
            }
        });
    }

Our package.json:

"dependencies": {
        "@sendgrid/mail": "^7.7.0",
        "axios": "^1.2.3",
        "cors": "^2.8.5",
        "crypto-js": "^4.1.1",
        "express": "^4.17.1",
        "firebase-admin": "^11.4.1",
        "lodash": "^4.17.21",
        "moment": "^2.29.4",
        "morgan": "^1.10.0",
        "nanoid": "^3.3.4",
        "stripe": "^11.6.0",
        "yup": "^0.32.11"
    },
    "devDependencies": {
        "@types/cors": "^2.8.13",
        "@types/crypto-js": "^4.1.1",
        "@types/express": "^4.17.15",
        "@types/lodash": "^4.14.191",
        "@types/morgan": "^1.9.4",
        "@types/node": "^18.11.18",
        "@types/yup": "^0.32.0",
        "@typescript-eslint/eslint-plugin": "^4.31.2",
        "@typescript-eslint/parser": "^4.31.2",
        "eslint": "^7.32.0",
        "nodemon": "^2.0.4",
        "ts-node": "^10.9.1",
        "typescript": "^4.9.4"
    }

About this issue

Original URL
State: open
Created a year ago
Reactions: 4
Comments: 30 (12 by maintainers)

Most upvoted comments

Good morning @dconeybe - Thanks a ****ton for the very useful reply.

We’re so relieved that we weren’t in fact doing anything particularly wrong and something had indeed changed on the Cloud side of things.

Within the hour of you posting last night, we started going through both your ideas, given how critical it was for the projects. In testing thus far, we have not had an error since Tuesday, January 31, 2023 4:23:03 PM (0 since implementing Idea 1). Thankfully we do not use snapshots on the backend.

As for Idea 2, it’s not as straight forward for us to quickly do, since we’re deploying a docker container. This means that the code gets uploaded to a Build Machine, from there everything is handled automatically within a Dockerfile and built to serve in a container. As mentioned above though, since we don’t use snapshots on the backend side, it doesn’t seem like we’ll need to go further into Idea 2 if we continue not to crash by bypassing GRPC

I’ll post an update later today

iAMkVIN-S on Feb 1, 2023

Here are two things that you could try which might mitigate the issue until we find the root cause and/or implement a fix.

Idea 1: Try the new “enable rest” option.

The networking errors are coming from the grpc networking layer. We use grpc in the sdk because it provides the complex bi-directional communication infrastructure that is needed for streaming queries (e.g. onSnapshot() callbacks). However, it looks like the queries that are causing you problems are not streaming queries, but rather just one time request/response queries. These “simpler” queries can be done using a simpler networking protocol that we call “rest”. These “rest” requests may even hit different Google edge nodes that potentially are not affected by this permissions issue.

@MarkDuckworth recently added a new “prefer rest” option to the sdk which I think you should try. By enabling this new option, the requests will be made to the backend using the “rest” protocol, until a request for bi-directional streaming (i.e. onSnapshot()) is made, at which point the sdk will start using grpc and will only use grpc thereafter.

To use the new “prefer rest” option, use this code to initialize the Firestore object: initializeFirestore(app, { preferRest: true }) (more details here: https://github.com/firebase/firebase-admin-node/pull/1901#issuecomment-1385621126). Just note that as soon as you call onSnapshot() that grpc will be used thereafter.

Idea 2: Discard clients that fail with this error

The SDK has some logic to stop using clients that fail with the RST_STREAM error (https://github.com/googleapis/nodejs-firestore/blob/789d9eb7f54b5329b17ef759f29252d17da47e26/dev/src/pool.ts#L270-L279). You could try to activate this logic on the permissions errors that you have been experiencing too. You’d have to hack patch this fix in, but it would be great to know if it works for you or not.

Here is how you would do it:

Grep through your node_modules directory for the string RST_STREAM.
Replace the occurrences of RST_STREAM in the err.message?.match(/RST_STREAM/) (which may look difference once it’s transpiled into JavaScript) to UNAVAILABLE or Forbidden.
Rebuild your app to pick up the changes.

It would be really useful to know if discarding the clients that experience this permissions error recover when a new connection is opened.

dconeybe on Jan 31, 2023

@dconeybe I don’t think our error and @PL47productions are the same.

First, the error log is different.

Second, our failing requests were for a simple .get() from the Firebase Admin SDK. Without any external API in play.

Third, we’ve had 0 errors since adding the preferRest param in the Initialize.

We still are building up a sandbox backend to reproduce the error but with the additional logs requested in the support ticket. If time is on our side, we’ll hopefully have that to you today.

iAMkVIN-S on Feb 10, 2023

@iAMkVIN-S Count me as one of the 1.1M weekly installers. My cloud functions have been running smoothly for months, then yesterday I started receiving 403 Forbidden error messages with all of my functions that make calls to an outside API. This is a critical issue as the cloud functions are core to my application.

Below is one of my functions that began failing yesterday. I first have to retrieve a token from my firestore and then make the request to the outside API (Finicity). I have confirmed the url and parameters are all generated and passed correctly in the logs of each failed attempt. Once the url is received, it is passed back to the end user.

It’s important to note that when running my functions from a firebase shell environment, everything runs smoothly and the calls succeed . However, when I invoke the functions from my application as an end user, I receive the 403 Forbidden error message. To that end, I also have a function that runs every 80 minutes making a call to an outside api that, again runs fine when I invoke it locally, but when it is called automatically, it just started failing yesterday nearly every time.

Sample Function ` exports.refreshPromiseCustomUrl = functions.https.onCall(async(data,context) => {

   const customerId = data.customerId;
  
    const refreshedUrl = await admin.firestore().collection('SampleCollection').doc('sampleDoc').get().then(doc => {
      const promises = [];
      promises.push(doc.data().accessToken);
      return Promise.all(promises)    
    }).then(async token => {
      const secondPromise = []
   
      const dataForUrl = JSON.stringify({"partnerId":"MyPartnerId","customerId":customerId});
      const configForUrl = {method: 'post',url: 'https://api.finicity.com/connect/v2/generate',
      headers: {'Accept': 'application/json',
           'Content-Type': 'application/json', 
          'Finicity-App-Key':'MyAppKey',
          'Finicity-App-Token': token[0],   
          },
        data : dataForUrl
      };
    
      const theRequestUrl = await axios(configForUrl).then(res => {
        return res.data.link
      }).catch(e => {
        functions.logger.error('Error reaching out to finicity, the token: ', token[0], 'customer Id: ', customerId, 'error message: ', e)
        return ''
      })

      secondPromise.push(theRequestUrl)
      return Promise.all(secondPromise);
   
    }).catch(err => {
      functions.logger.error('error  ', err)
      return '';
    })
  
  return{theUrl: refreshedUrl[0]}
  
  })`

Error Message abbreviated [Request failed with status code 403 ](response: { status: 403, statusText: ‘Forbidden’,)

package.json { "name": "functions", "description": "Cloud Functions for Firebase", "scripts": { "lint": "eslint .", "serve": "firebase serve --only functions", "shell": "firebase functions:shell", "start": "npm run shell", "deploy": "firebase deploy --only functions", "logs": "firebase functions:log" }, "engines": { "node": "16" }, "dependencies": { "@sendgrid/mail": "^6.5.5", "axios": "^1.3.2", "cors": "^2.8.5", "firebase-admin": "^11.5.0", "firebase-functions": "^4.2.1", "googleapis": "^107.0.0", "moment": "2.26.0", "nodemailer": "^2.5.0", "stripe": "^6.36.0", "uuid": "^3.3.2" }, "devDependencies": { "eslint": "^5.12.0", "eslint-plugin-promise": "^4.0.1" }, "private": true }

I feel I have exhausted my options at this point. I have updated all of my dependencies, tried variations of my functions, checked all of my parameters for the api call are set correctly, confirmed with Finicity that nothing has changed on their end, confirmed user access to my functions, tried using different function names, tried increasing the memory and instances allocated to each function…nothing has worked.

I will note, however, that randomly every once in a while a call will go through successfully. No idea why. I am absolutely stuck at this point.

Just hoping this issue gets resolved as soon as possible.

PL47productions on Feb 9, 2023

@dconeybe It’s unfortunate to hear that you are unable to reproduce it. Moreso, it’s very suprising to me that we aren’t seeing more of the 1.1M weekly installers of firebase-admin coming to the issue considering that our usage of this package is in its simplest and documented form.

We’ll be more than happy to get any other data you need to figure out the exact root cause, please do let us know what you’d like us to do! The team is at your disposal.

iAMkVIN-S on Feb 2, 2023

Thanks for the update. I communicated this to the backend team. @manoslive, if you can provide a project ID, that will help them diagnose. You may want to create a support ticket where you can provide your project ID. In that ticket reference internal issue b/266097991, so we can link your info to the right place.

MarkDuckworth on Jan 25, 2023

@MarkDuckworth We are very much still getting the issue. This has become mission critical as its impacting our clients businesses with so much downtime.

Thankfully our backend does reboot when we get the error and sometimes get away with a couple hundreds thousand OK requests before we randomly get the error again, though not always on the first go.

Since Saturday Jan 21st in the early AM, we’ve logged everytime a reboot was necessary due to the error:

266 times in 96 hours. This is ridiculous. How did we process 250 Million + requests over the last 2 years from this one backend alone, and without making any changes, woke up on Jan 17th to non-stop crashing.

(timestamps in milliseconds)

RESTARTS: (266)
1674289037983000
1674291718989000
1674292718477000
1674292831277000
1674293015488000
1674293207867000
1674293351222000
1674293482690000
1674293671756000
1674293925832000
1674294691855000
1674295176718000
1674295368606000
1674302961810000
1674310266033000
1674313970101000
1674314074095000
1674317780255000
1674320533465000
1674323402254000
1674323508347000
1674327216832000
1674330920530000
1674331027280000
1674334735600000
1674342048007000
1674342156709000
1674342263106000
1674342366656000
1674349721840000
1674349827806000
1674353536885000
1674353643205000
1674353755830000
1674353862040000
1674357585725000
1674361293747000
1674361408919000
1674361522736000
1674368903827000
1674370885206000
1674372981068000
1674373297597000
1674378425390000
1674378560965000
1674378689308000
1674379576327000
1674380770858000
1674381065547000
1674381366523000
1674381489429000
1674381778539000
1674381925985000
1674382068010000
1674382204064000
1674382470922000
1674382610478000
1674386339199000
1674386991775000
1674390691809000
1674390804020000
1674391161370000
1674391285930000
1674391412342000
1674391520708000
1674392531052000
1674392656840000
1674393593907000
1674393718321000
1674397443718000
1674397562319000
1674401405814000
1674401513179000
1674401636493000
1674414132380000
1674421437700000
1674421542385000
1674425242982000
1674428944012000
1674429046910000
1674432751493000
1674436484620000
1674440191661000
1674447512418000
1674447618271000
1674451333586000
1674451446735000
1674451565913000
1674455457640000
1674455581824000
1674458298975000
1674458449492000
1674459018662000
1674459141174000
1674459430226000
1674459590105000
1674461522378000
1674461658890000
1674462242309000
1674464055770000
1674464333230000
1674468106528000
1674471818514000
1674471925167000
1674472030824000
1674473252294000
1674473358435000
1674473462695000
1674473568298000
1674473672535000
1674473778392000
1674473883947000
1674474374371000
1674474480382000
1674478188324000
1674478293910000
1674478401257000
1674478505642000
1674478612956000
1674478719303000
1674478821309000
1674478924334000
1674482638673000
1674484151501000
1674486801540000
1674490522969000
1674490653601000
1674490777917000
1674490883999000
1674493345650000
1674493467698000
1674493583417000
1674497303337000
1674501010890000
1674508329499000
1674508453084000
1674515759076000
1674515882091000
1674519601762000
1674519708093000
1674523155931000
1674523281901000
1674524492775000
1674524618280000
1674524735337000
1674528462710000
1674528591823000
1674528713547000
1674528838422000
1674532585741000
1674532708146000
1674536415733000
1674536581604000
1674536737057000
1674538504041000
1674538644874000
1674539028197000
1674539217798000
1674539356701000
1674539484498000
1674543149732000
1674545035427000
1674545930536000
1674547972890000
1674548133400000
1674548289753000
1674549122357000
1674549798045000
1674550204086000
1674550594478000
1674550718495000
1674554486809000
1674554613388000
1674554737501000
1674557057843000
1674557188193000
1674560704323000
1674560818521000
1674560940761000
1674561063958000
1674561190032000
1674561304001000
1674565008898000
1674565134900000
1674565264118000
1674565387570000
1674569093243000
1674572818426000
1674572942317000
1674573069250000
1674573193383000
1674573303982000
1674578422675000
1674579731853000
1674583450489000
1674583574515000
1674587718965000
1674587835103000
1674587945272000
1674591660099000
1674591782559000
1674595489365000
1674595608926000
1674599325835000
1674599453186000
1674603168959000
1674603293688000
1674603417479000
1674607127323000
1674607243646000
1674607355700000
1674618287483000
1674618402394000
1674618524873000
1674618629396000
1674622386776000
1674624004053000
1674624125431000
1674626961823000
1674627098267000
1674627226830000
1674627381640000
1674628270592000
1674628559700000
1674630545011000
1674633704752000
1674635882699000
1674636377943000
1674638420986000
1674638542684000
1674638652167000
1674642369224000
1674642491417000
1674642615681000
1674642740182000
1674646504005000
1674646628981000
1674646756498000
1674646882099000
1674647235196000
1674647359336000
1674651450683000
1674651572737000
1674651697308000
1674655419030000
1674655537158000
1674655660877000
1674655782904000
1674655906452000
1674656030740000
1674656140539000
1674659863790000
1674659991764000
1674663708063000
1674663834388000
1674663958937000
1674664085315000
1674664203979000
1674664326312000
1674664451834000
1674666138411000
1674666263919000
1674666387484000
1674666508982000
1674666615555000
1674666735555000

iAMkVIN-S on Jan 25, 2023

@manoslive, can you tell us when this error started occurring for you?

MarkDuckworth on Jan 25, 2023

Important update:

This is now happening on another backend of ours that is running firebase-admin

The last deploy of this backend was from 6 weeks ago, with a non significant change. This has also been running for years without an issue prior. An important difference here, is rebooting doesn’t seem to clear the error for a little bit, we’ve been unable to bring it back up and if this happens to our other backend, we’re going to be in trouble!

Jan 20 10:08:51 (node:1) UnhandledPromiseRejectionWarning: Error: 14 UNAVAILABLE: 403:Forbidden
Jan 20 10:08:51 at Object.callErrorFromStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/call.js:30:26)
Jan 20 10:08:51 at Object.onReceiveStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/client.js:328:49)
Jan 20 10:08:51 at Object.onReceiveStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:304:181)
Jan 20 10:08:51 at Http2CallStream.outputStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:116:74)
Jan 20 10:08:51 at Http2CallStream.maybeOutputStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:155:22)
Jan 20 10:08:51 at Http2CallStream.endCall (/usr/src/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:141:18)
Jan 20 10:08:51 at Http2CallStream.handleTrailers (/usr/src/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:266:14)
Jan 20 10:08:51 at ClientHttp2Stream.<anonymous> (/usr/src/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:315:26)
Jan 20 10:08:51 at ClientHttp2Stream.emit (events.js:200:13)
Jan 20 10:08:51 at emit (internal/http2/core.js:247:8)
Jan 20 10:08:51 Caused by: Error
Jan 20 10:08:51 at Query._get (/usr/src/app/node_modules/@google-cloud/firestore/build/src/reference.js:1485:23)
Jan 20 10:08:51 at Query.get (/usr/src/app/node_modules/@google-cloud/firestore/build/src/reference.js:1474:21)
Jan 20 10:08:51 at Function.find (/usr/src/app/dist/models/BaseModel.js:362:48)
Jan 20 10:08:51 at /usr/src/app/dist/app.js:43:27
Jan 20 10:08:51 at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
Jan 20 10:08:51 at next (/usr/src/app/node_modules/express/lib/router/route.js:137:13)
Jan 20 10:08:51 at Route.dispatch (/usr/src/app/node_modules/express/lib/router/route.js:112:3)
Jan 20 10:08:51 at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
Jan 20 10:08:51 at /usr/src/app/node_modules/express/lib/router/index.js:281:22
Jan 20 10:08:51 at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:335:12)
Jan 20 10:08:51 (node:1) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 6058)

iAMkVIN-S on Jan 20, 2023