nodejs-firestore: Error: 14 UNAVAILABLE: 403:Forbidden
Describe your environment
- Operating System version: Ubuntu 20.04.1 LTS
- Firebase SDK version: 11.4.1
- Firebase Product: Firebase Admin
- Node.js version: 18.13.0
- NPM version: 8.19.3 (Used as a subdependency)
@google-cloud/firestore
version: 6.4.2
Describe the problem
Steps to reproduce:
Our NodeJS backend, running ExpressJS and using Firebase Admin SDK has been running smoothly for 2 years. All of a sudden, on the morning of the 17th of Jan, our service was interupted by a seemignly routine Firebase call made by our baseModel. This causes all requests to fail until we reboot the backend. Since then, the error has happened randomly every 2-6+ hours with thousands of OK requests in between. The error seems to stem from the Firebase Admin SDK package/its depencies.
Relevant Code:
Error Log:
Jan 17 10:02:08 Error: 14 UNAVAILABLE: 403:Forbidden
Jan 17 10:02:08 at Object.callErrorFromStatus (/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
Jan 17 10:02:08 at Object.onReceiveStatus (/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client.js:352:49)
Jan 17 10:02:08 at Object.onReceiveStatus (/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181)
Jan 17 10:02:08 at /app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/call-stream.js:188:78
Jan 17 10:02:08 at processTicksAndRejections (internal/process/task_queues.js:82:9)
Jan 17 10:02:08 for call at
Jan 17 10:02:08 at ServiceClientImpl.makeServerStreamRequest (/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client.js:336:30)
Jan 17 10:02:08 at ServiceClientImpl.<anonymous> (/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
Jan 17 10:02:08 at /app/node_modules/@google-cloud/firestore/build/src/v1/firestore_client.js:205:29
Jan 17 10:02:08 at /app/node_modules/google-gax/build/src/streamingCalls/streamingApiCaller.js:38:28
Jan 17 10:02:08 at /app/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16
Jan 17 10:02:08 at Object.request (/app/node_modules/google-gax/build/src/streamingCalls/streaming.js:126:40)
Jan 17 10:02:08 at Timeout.makeRequest [as _onTimeout] (/app/node_modules/retry-request/index.js:139:28)
Jan 17 10:02:08 at listOnTimeout (internal/timers.js:531:17)
Jan 17 10:02:08 at processTimers (internal/timers.js:475:7)
Jan 17 10:02:08 Caused by: Error
Jan 17 10:02:08 at Firestore.getAll (/app/node_modules/@google-cloud/firestore/build/src/index.js:902:23)
Jan 17 10:02:08 at DocumentReference.get (/app/node_modules/@google-cloud/firestore/build/src/reference.js:211:32)
Jan 17 10:02:08 at Function.<anonymous> (/app/dist/models/baseModel.js:278:62)
Jan 17 10:02:08 at Generator.next (<anonymous>)
Jan 17 10:02:08 at /app/dist/models/baseModel.js:31:71
Jan 17 10:02:08 at new Promise (<anonymous>)
Jan 17 10:02:08 at __awaiter (/app/dist/models/baseModel.js:27:12)
Jan 17 10:02:08 at Function.findById (/app/dist/models/baseModel.js:274:16)
Jan 17 10:02:08 at /app/dist/routes/routeUtils.js:62:42
Jan 17 10:02:08 at Generator.next (<anonymous>) {
Jan 17 10:02:08 code: 14,
Jan 17 10:02:08 details: '403:Forbidden',
Jan 17 10:02:08 metadata: Metadata {
Jan 17 10:02:08 internalRepr: Map {
Jan 17 10:02:08 'content-type' => [Array],
Jan 17 10:02:08 'content-length' => [Array],
Jan 17 10:02:08 'date' => [Array],
Jan 17 10:02:08 'alt-svc' => [Array]
Jan 17 10:02:08 },
Jan 17 10:02:08 options: {}
Jan 17 10:02:08 }
Jan 17 10:02:08 }
Trickled down baseModel line from our code that does the Firebase fetch:
static findById(id, showDeleted = false) {
return __awaiter(this, void 0, void 0, function* () {
try {
if (!id)
throw new Error(`Invalid id provided for model ${this.name}`);
const result = yield this.collection.doc(id).get();
if (!result.exists)
return null;
if (!showDeleted && result.data().isDeleted)
return null;
return new this(Object.assign({}, result.data()));
}
catch (e) {
console.error(e);
}
});
}
Our package.json:
"dependencies": {
"@sendgrid/mail": "^7.7.0",
"axios": "^1.2.3",
"cors": "^2.8.5",
"crypto-js": "^4.1.1",
"express": "^4.17.1",
"firebase-admin": "^11.4.1",
"lodash": "^4.17.21",
"moment": "^2.29.4",
"morgan": "^1.10.0",
"nanoid": "^3.3.4",
"stripe": "^11.6.0",
"yup": "^0.32.11"
},
"devDependencies": {
"@types/cors": "^2.8.13",
"@types/crypto-js": "^4.1.1",
"@types/express": "^4.17.15",
"@types/lodash": "^4.14.191",
"@types/morgan": "^1.9.4",
"@types/node": "^18.11.18",
"@types/yup": "^0.32.0",
"@typescript-eslint/eslint-plugin": "^4.31.2",
"@typescript-eslint/parser": "^4.31.2",
"eslint": "^7.32.0",
"nodemon": "^2.0.4",
"ts-node": "^10.9.1",
"typescript": "^4.9.4"
}
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 4
- Comments: 30 (12 by maintainers)
Good morning @dconeybe - Thanks a ****ton for the very useful reply.
We’re so relieved that we weren’t in fact doing anything particularly wrong and something had indeed changed on the Cloud side of things.
Within the hour of you posting last night, we started going through both your ideas, given how critical it was for the projects. In testing thus far, we have not had an error since Tuesday, January 31, 2023 4:23:03 PM (0 since implementing Idea 1). Thankfully we do not use snapshots on the backend.
As for Idea 2, it’s not as straight forward for us to quickly do, since we’re deploying a docker container. This means that the code gets uploaded to a Build Machine, from there everything is handled automatically within a Dockerfile and built to serve in a container. As mentioned above though, since we don’t use snapshots on the backend side, it doesn’t seem like we’ll need to go further into Idea 2 if we continue not to crash by bypassing GRPC
I’ll post an update later today
Here are two things that you could try which might mitigate the issue until we find the root cause and/or implement a fix.
Idea 1: Try the new “enable rest” option.
The networking errors are coming from the grpc networking layer. We use grpc in the sdk because it provides the complex bi-directional communication infrastructure that is needed for streaming queries (e.g.
onSnapshot()
callbacks). However, it looks like the queries that are causing you problems are not streaming queries, but rather just one time request/response queries. These “simpler” queries can be done using a simpler networking protocol that we call “rest”. These “rest” requests may even hit different Google edge nodes that potentially are not affected by this permissions issue.@MarkDuckworth recently added a new “prefer rest” option to the sdk which I think you should try. By enabling this new option, the requests will be made to the backend using the “rest” protocol, until a request for bi-directional streaming (i.e.
onSnapshot()
) is made, at which point the sdk will start using grpc and will only use grpc thereafter.To use the new “prefer rest” option, use this code to initialize the Firestore object:
initializeFirestore(app, { preferRest: true })
(more details here: https://github.com/firebase/firebase-admin-node/pull/1901#issuecomment-1385621126). Just note that as soon as you call onSnapshot() that grpc will be used thereafter.Idea 2: Discard clients that fail with this error
The SDK has some logic to stop using clients that fail with the
RST_STREAM
error (https://github.com/googleapis/nodejs-firestore/blob/789d9eb7f54b5329b17ef759f29252d17da47e26/dev/src/pool.ts#L270-L279). You could try to activate this logic on the permissions errors that you have been experiencing too. You’d have to hack patch this fix in, but it would be great to know if it works for you or not.Here is how you would do it:
node_modules
directory for the stringRST_STREAM
.RST_STREAM
in theerr.message?.match(/RST_STREAM/)
(which may look difference once it’s transpiled into JavaScript) toUNAVAILABLE
orForbidden
.It would be really useful to know if discarding the clients that experience this permissions error recover when a new connection is opened.
@dconeybe I don’t think our error and @PL47productions are the same.
First, the error log is different.
Second, our failing requests were for a simple .get() from the Firebase Admin SDK. Without any external API in play.
Third, we’ve had 0 errors since adding the
preferRest
param in the Initialize.We still are building up a sandbox backend to reproduce the error but with the additional logs requested in the support ticket. If time is on our side, we’ll hopefully have that to you today.
@iAMkVIN-S Count me as one of the 1.1M weekly installers. My cloud functions have been running smoothly for months, then yesterday I started receiving 403 Forbidden error messages with all of my functions that make calls to an outside API. This is a critical issue as the cloud functions are core to my application.
Below is one of my functions that began failing yesterday. I first have to retrieve a token from my firestore and then make the request to the outside API (Finicity). I have confirmed the url and parameters are all generated and passed correctly in the logs of each failed attempt. Once the url is received, it is passed back to the end user.
It’s important to note that when running my functions from a firebase shell environment, everything runs smoothly and the calls succeed . However, when I invoke the functions from my application as an end user, I receive the 403 Forbidden error message. To that end, I also have a function that runs every 80 minutes making a call to an outside api that, again runs fine when I invoke it locally, but when it is called automatically, it just started failing yesterday nearly every time.
Sample Function ` exports.refreshPromiseCustomUrl = functions.https.onCall(async(data,context) => {
Error Message abbreviated [Request failed with status code 403 ](response: { status: 403, statusText: ‘Forbidden’,)
package.json
{ "name": "functions", "description": "Cloud Functions for Firebase", "scripts": { "lint": "eslint .", "serve": "firebase serve --only functions", "shell": "firebase functions:shell", "start": "npm run shell", "deploy": "firebase deploy --only functions", "logs": "firebase functions:log" }, "engines": { "node": "16" }, "dependencies": { "@sendgrid/mail": "^6.5.5", "axios": "^1.3.2", "cors": "^2.8.5", "firebase-admin": "^11.5.0", "firebase-functions": "^4.2.1", "googleapis": "^107.0.0", "moment": "2.26.0", "nodemailer": "^2.5.0", "stripe": "^6.36.0", "uuid": "^3.3.2" }, "devDependencies": { "eslint": "^5.12.0", "eslint-plugin-promise": "^4.0.1" }, "private": true }
I feel I have exhausted my options at this point. I have updated all of my dependencies, tried variations of my functions, checked all of my parameters for the api call are set correctly, confirmed with Finicity that nothing has changed on their end, confirmed user access to my functions, tried using different function names, tried increasing the memory and instances allocated to each function…nothing has worked.
I will note, however, that randomly every once in a while a call will go through successfully. No idea why. I am absolutely stuck at this point.
Just hoping this issue gets resolved as soon as possible.
@dconeybe It’s unfortunate to hear that you are unable to reproduce it. Moreso, it’s very suprising to me that we aren’t seeing more of the 1.1M weekly installers of
firebase-admin
coming to the issue considering that our usage of this package is in its simplest and documented form.We’ll be more than happy to get any other data you need to figure out the exact root cause, please do let us know what you’d like us to do! The team is at your disposal.
Thanks for the update. I communicated this to the backend team. @manoslive, if you can provide a project ID, that will help them diagnose. You may want to create a support ticket where you can provide your project ID. In that ticket reference internal issue b/266097991, so we can link your info to the right place.
@MarkDuckworth We are very much still getting the issue. This has become mission critical as its impacting our clients businesses with so much downtime.
Thankfully our backend does reboot when we get the error and sometimes get away with a couple hundreds thousand OK requests before we randomly get the error again, though not always on the first go.
Since Saturday Jan 21st in the early AM, we’ve logged everytime a reboot was necessary due to the error:
266 times in 96 hours. This is ridiculous. How did we process 250 Million + requests over the last 2 years from this one backend alone, and without making any changes, woke up on Jan 17th to non-stop crashing.
(timestamps in milliseconds)
@manoslive, can you tell us when this error started occurring for you?
Important update:
This is now happening on another backend of ours that is running
firebase-admin
The last deploy of this backend was from 6 weeks ago, with a non significant change. This has also been running for years without an issue prior. An important difference here, is rebooting doesn’t seem to clear the error for a little bit, we’ve been unable to bring it back up and if this happens to our other backend, we’re going to be in trouble!