firebase-functions: Error: The request was aborted because there was no available instance.

This is happening in production firebase environment with a Blaze subscription, I’ve started seeing the error The request was aborted because there was no available instance. since 22nd August 10pm GMT+8. This error happens across all functions when I make 100+ invocations. When the error appears it affects all other functions as well (see screenshot). Can happen with any function with maxInstances parameter set or without it.

All functions are deployed in us-central1 Quotas doesn’t seem to reach the limit.

Related issues

[REQUIRED] Version info

node: v12.22.3

firebase-functions: 3.14.1

firebase-tools: 9.16.0

firebase-admin: 9.11.0

[REQUIRED] Test case

Firebase pubsub listener

[REQUIRED] Steps to reproduce

Send 100+ messages to the firebase pubsub

[REQUIRED] Expected behavior

Functions execute.

[REQUIRED] Actual behavior

Functions failing with a message: The request was aborted because there was no available instance. image

Were you able to successfully deploy your functions?

successfully deployed

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 61
  • Comments: 93 (10 by maintainers)

Most upvoted comments

Our cloud functions have been able to scale with no issue for over a year. Nothing has changed within our infrastructure but as of late last night 2021-09-07, our functions have begun to fail. This doesn’t appear to be related to traffic, cold starts or long running executions. A request will be made to a function and it will fail on every request for several minutes. It will then begin to work and another function will begin to fail.

There definitely seems to be something larger going on here than just revealing logs to Stackdriver.

I don’t really understand how its only a silent vs logging problem. Our app has been running without issue for months, and only NOW are we getting these issues that actually are preventing some functions from running. It feels to me like something actually is going on or has changed, because we have not changed our backend for a while and its been running smoothly until now.

We are on Blaze subscription. We’re also experiencing this since August 22rd running on Node 12 instances on asia-south1 Same error across almost all functions. Both onCall and onRequest functions failing.

Tried upgrading to Node 14, and setting minInstances=1 and maxInstances=5. Error still persists.

image

We’re also experiencing this since August 23rd 6 AM CEST running on Node 14 instances. Same error, across multiple functions on europe-west1.

@tolypash not exactly the solution you’d like but consider moving out of google’s ecosystem. This is a lesson learned hard. So far I have only heard about poor customer support but now I have witnessed it with this issue.

Hot off the press - Google Cloud Functions in us-central1 did report some problem ~2021-09-08:

https://status.cloud.google.com/incidents/16SSwVXrYSLjy8fEMvyZ

The status report claims that the issue only affected functions deployed in us-central1 and that it is now resolved. If you are still seeing issues, please contact Google Cloud Support.

We’ve had this issue for a while now. It seems to be happening more often lately.

@taeold You are lying. This issue is not about invisible warning. Our game service has been calling 100,000 https calls everyday without error for 1 month. We are just experiencing the same issue reported here since yesterday (asia-northeast1). We had to handle 6+ purchase failure cases for past 24 hours because of this issue. Be honest, Google. Tell us what you guys are doing. US -> EU -> South-east Asia (Singapore) -> North-east Asia (Japan). Something is happening.

im having the same issue on asia-northeast1

My theory is that they reduced tolerance for cold starts. For example: earlier they were ok waiting 2s for cold starting but now they throw the said error in just 1s. If you notice (or can create) a function with lil to no dependency, basically a helloWorld function, will not get affected by this.

This issue also aligns with announcement of min-instance for cloud function. Support recommended using this new beta feature but did not have an answer for the cause of this issue.

So likely they changed some configuration in the backend and this is an effect of that. Lot more people having production impact because of this here: https://issuetracker.google.com/issues/194948300 Hope they find and fix this soon 🤞

image This error is still going on.

I am getting the same issue on asia-northeast1.

All functions failed between 15:00 and 16:00 today.

My Cloud Function max_instances is set to no limit.

Only started happening few hours earlier on cloud functions. Re deployed but doesn’t seem to be working.

Hi everyone.

Google Cloud Function (GCF) users as a whole are reporting the same issue described here, and https://issuetracker.google.com/issues/153207649#comment3 is the official response from the GCF team.

tl;dr GCF nodejs runtime used to silently drop requests when instance couldn’t be scaled fast enough to respond to demand. Now it’s logging the failed request on your project’s log, hence the sudden appearance of the issue (release note). For pubsub-triggered functions, this error is usually handled gracefully by automatic retry mechanism in the GCF infrastructure. The same can’t be said of HTTP-triggered functions, and the request would have been dropped by the client unless a retry mechanism was already implemented.

To reduce occurrence of the once invisible but now transparent “aborted because there was no available instance” errors, recommendations in https://cloud.google.com/functions/docs/troubleshooting#scalability applies.

I hope this clears up the confusion a bit. I’ll leave this ticket open to answer any follow up questions, but since this problem is directly related to Google Cloud Functions and not specific to Firebase Functions, please consider reaching out to GCP support with your project-specific questions.

@larssn I get what you’re saying. OP is saying support is not being helpful and so was the case with me when I was helping someone navigating this issue. I think suggestion/solution of moving out of this is pragmatic. More so when you are having a real customer impact which is making you lose money. You do not want to bet your company and its revenue to a cloud company which is having hard time determining if at all there is a problem.

Anyway, that is my personal take on this. You are welcome to disagree with it 😃

Same issue since ~ 2021-08-26 20:00 BST,

Node 14, firebase-functions 3.14.1 firebase-admin 9.10.0

It is happening with very minor spikes of requests < 100 across all function deployments.

I’d also like to understand why this very impactful issue being reported by many people isn’t reflected on https://status.cloud.google.com/ as being investigated.

Edit: Looks like this is already being tracked here https://issuetracker.google.com/issues/194948300

@amitrao17 i think you should definitely await all the promises before returning - I’m guessing part of what is happening is whatever Google changed is killing functions much more quickly after they finish running which per the spec is correct or at least not unexpected. Maybe before Google let functions hang around longer so for example your unresolved promises had time to finish even though there was no guarantee of that

I would also like to note that “retrying” HTTP requests from the clients side is not possible either, because this issue seems to be affecting all functions for a certain period of time (ranging from a few seconds to sometimes minutes)

So even when I retry on the client, there will be another error thrown unless retries are spaced out minutes apart, which is not possible for the client.

Honestly I think @samodadela’s #962 (comment) above is my biggest fear. Sure, we can go re-engineer our app to add exponential backoffs to all calls. But Firebase has no mechanism to do that for scheduled functions or firestore triggers. “Just add retry options” CAN’T be the final answer here.

luckily scheduled functions and triggers are guaranteed at least once working delivery it seems

I’m getting dozens of these errors again for HTTP functions, it seems even worse than before.

On Wed, Sep 15, 2021 at 12:14 AM Wtrapp @.***> wrote:

Yep. This issue is back for us too on us-central

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/firebase/firebase-functions/issues/962#issuecomment-919576041, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHTPIFS7MVPMXC5ATIFKBTUB7JNBANCNFSM5CYAK4RA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Apologies, my last post is not true. I have just filtered my log to see that other user’s of my cloud functions are encountering this error, but not as often as when I first posted.

Screenshot 2021-09-14 at 16 12 01

We used to have a similar problem using Firebase for HTTP serving - not errors but cold starts causing HTTP requests to take 10+ seconds meaning our app would often hang loading looking like it crashed. Seems a change has turned what were cold starts into errors.

The problem with Firebase is there is no way to control cold starts unlike with AWS Lambda. Lambda is much smarter about scaling up and down and sending requests to existing instances whereas with Firebase it is more random. Eg having a pinger keeping an instance alive doesn’t really do anything useful.

The solution is to stop using Firebase for HTTP… it really is very bad for it. Switch to App Engine and you can control the scaling a lot more and avoid these problems.

Happening to me too. No issues with scaling since we started our project in March 2020 until start of last week, when this issue happens every few minutes

save issue:(

@larssn At-least once guarantee applies to all event-driven functions. Are you seeing events from Firebase/Firestore being dropped on your project?

@taeold But could you clarify what happens to event-driven functions, such as firebase/firestore triggers? The docs here guarantees at-least-once execution.

I’m hoping that is still the case. Only few of our functions are idempotent enough to warrant enabling the retry policy.