nodejs-pubsub: API google.pubsub.v1.Publisher exceeded 5000 milliseconds when running on Cloud Run

When the image is run locally on Docker the application is able to successfully publish messages to Cloud PubSub.

When the same image is deployed to Cloud Run, it won’t publish a single message. All attempts fail with the error:

GoogleError: Total timeout of API google.pubsub.v1.Publisher exceeded 5000 milliseconds before any response was received.
    at repeat (/usr/root/node_modules/google-gax/build/src/normalCalls/retries.js:66:31)
    at Timeout._onTimeout (/usr/root/node_modules/google-gax/build/src/normalCalls/retries.js:101:25)
    at listOnTimeout (internal/timers.js:557:17)
    at processTimers (internal/timers.js:500:7)
code 4

Sample Code:

import { mongoose } from '@shared/connections'
import MongoDbModel from '@shared/models'
import { PubSub } from '@google-cloud/pubsub'

import {
  REMINDER_COPY_SMS = '',
  REMINDER_MAX_DAYS_SINCE_LAST_MSG = 0,
  REMINDER_MAX_DAYS_SINCE_LAST_READING = 0,
  REMINDER_MESSAGE_TEMPLATE = 'reading-reminder',
  TOPICS_MESSAGE_DISPATCH,
  KEYFILE_PATH,
} from '../constants.js'

const pubsubClient = new PubSub({
  projectId: process.env.GOOGLE_CLOUD_PROJECT,
  keyFilename: KEYFILE_PATH,
})

const msMaxSinceLastReading = REMINDER_MAX_DAYS_SINCE_LAST_READING * 24 * 60 * 60 * 1000
const msMaxSinceLastMsg = REMINDER_MAX_DAYS_SINCE_LAST_MSG * 24 * 60 * 60 * 1000

const { model: Patient } = new MongoDbModel(mongoose, 'Patient')

const findInactivePatients = () => Patient.find({
  status: 'ACTIVE',
  $and: [{
    $or: [{
      last_reading_at: { $exists: false },
    }, {
      last_reading_at: { $lt: new Date(Date.now() - msMaxSinceLastReading) },
    }],
  }, {
    $or: [{
      last_reading_reminder_at: { $exists: false },
    }, {
      last_reading_reminder_at: { $lt: new Date(Date.now() - msMaxSinceLastMsg) },
    }],
  }, {
    // Test with known patients only
    $or: [{
      first_name: { $regex: /Bruno/i }, last_name: { $regex: /Soares/i },
    }],
  }],
}, {
  first_name: 1,
  last_name: 1,
  gender: 1,
  last_reading_at: 1,
  phones: 1,
}, {
  lean: true,
})

const updateLastReadingReminderAt = (patient) => Patient.findOneAndUpdate({
  _id: patient._id,
}, {
  last_reading_reminder_at: Date.now(),
})

const createMessagePayload = (patient) => ({
  body: REMINDER_COPY_SMS,
  channels: [{
    name: 'sms',
    contacts: patient.phones.map((phone) => phone.E164),
    specifications: {
      template: REMINDER_MESSAGE_TEMPLATE,
    },
  }],
})

const dispatchMessage = async (patient) => {
  if (!patient.phones || !patient.phones.length) {
    return
  }
  const payload = createMessagePayload(patient)
  console.info('Worker reminder-report-vitals', 'event payload', JSON.stringify(payload))
  try {
    await pubsubClient.topic(TOPICS_MESSAGE_DISPATCH).publishMessage({ json: payload })
    console.info('Worker reminder-report-vitals', 'event published to cloudPubSub')
    pubsubClient.close()
    await updateLastReadingReminderAt(patient)
    console.info('Worker reminder-report-vitals', 'last_reading_reminder_at updated for patient', patient._id)
  } catch (err) {
    console.error(err)
  }
}

const handler = async () => {
  console.info('Worker reminder-report-vitals', 'exectution started')
  try {
    const inactiveList = await findInactivePatients()
    inactiveList.forEach(dispatchMessage)
  } catch (err) {
    console.error(err)
  }
}

export default handler

The code hangs after logging this line: console.info('Worker reminder-report-vitals', 'event payload', JSON.stringify(payload))

Sample Event Payload:

{"body":"This is {{ORG}}. We haven't been receiving your vitals. Reply \"start\" to report your vitals now. Reply \"stop\" at any time to opt-out from automated reminders.","channels":[{"name":"sms","contacts":["+15555555555"],"specifications":{"template":"reading-reminder"}}]}

Environment details

  • OS: Linux Alpine
  • Node.js version: 14.16.1
  • @google-cloud/pubsub version: 2.18.4

Steps to reproduce

  1. Create PubSub topic to receive messages and replace its name in constants.js TOPICS_MESSAGE_DISPATCH.
  2. Create nodejs application image from sample code. For sake of replicating only the issue, database operations may be removed - instead the sample event payload may be used.
  3. Run the container locally with Docker.
  4. Push the image to GCP Container Registry
  5. Deploy a Cloud Run service using the image
  6. See timeout erro in Cloud Logs

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 20
  • Comments: 82 (3 by maintainers)

Commits related to this issue

Most upvoted comments

Folks make sure you don’t have PUBSUB_EMULATOR_HOST environment variable set when your app is running (obviously if you want to use GCP pubsub, not emulator). In my case, this was the issue, as I had this environment variable equal to localhost:PORT which was giving timeout errors all the time when I had expected it to make requests to GCP - after .env was excluded from the image we use to deploy in GCP the problem is solved. Cheers

I got error “google.pubsub.v1.Publisher exceeded 60000 milliseconds before any response was received.”, By increasing gaxOpts timeout value to 100,000 solved our problem.

const publishOptions = {
  gaxOpts: {
    timeout: 100000,
  },
};

const topicChannel = pubsub.topic(TOPIC_ID, publishOptions);

@apettiigrew Remove ^… use version only. “@google-cloud/pubsub”: “2.17.0”

Something to consider - we noticed this issue arising from an HttpFunction where we were sending a response before publishing to PubSub. Having awaits after sending a response is contraindicated by google, and when we switched the order to publish first, the issue went away.

Sorry @JamesTheTerry I mean I have the same issue

error - unhandledRejection: GoogleError: Total timeout of API google.pubsub.v1.Publisher exceeded 60000 milliseconds before any response was received.

I don’t believe this is the issue in production. Most likely cloud run now kills the network access when the container is inactive (aka when the request has ended) which happens more often on low load. It used to work though, so I believe it might be a change in cloud run infrastructure. I am making sure that all the pubsub calls are awaited properly on my side. You should also set .topic(topic, { batching: { maxMessages: 1 } }) so that the message is sent right away instead of waiting in a queue.

Inside cloud run, i needed to disable the batching (maxMessages = 1), and make sure the promise was awaited before the http response was returned.

Note: By default the cloud run indeed dont allocate cpu after the request is finished (https://cloud.google.com/blog/products/serverless/cloud-run-gets-always-on-cpu-allocation)

Good point. I only updated Pubsub really. Will review them now. Thanks

Was good for about two weeks. Checked we are awaiting everything. Put in tweaks mentioned above. Just had a pubsub error chain this morning. Really at a loss here. Was really hoping that it will work. Any suggestions at this point welcome. Might try to put pubsub back to how it was setup in documentation, maybe this will work this time with proper awaitig.

Im experiencing the same problem and the solution that worked for me was to initiate pubsub inside the function and then close the topic. Reason for this was apparently that we kept batching it, resulting in cloud function closing before it had a chance to send it.

    try {
      const pubSubClient = new PubSub();
      const topic = pubSubClient.topic(topicName, {
        batching: { maxMessages: 1 },
        gaxOpts: {
          timeout: 100000,
        },
      });

      const messageId = await topic.publishMessage({ json }).catch((err) => {
        logger.error(
          `Received error while publishing to ${topicName}: ${err.message}`
        );
        logger.error(err);
        throw new InternalServerError();
      });
      await pubSubClient.close();
      logger.info(
        `Message ${messageId} was published to ${topicName} with message ${JSON.stringify(
          json
        )}`
      );
    } catch (err) {
      logger.error(err);
      throw new InternalServerError();
    }
 }

EDIT: Added max batching and timeout just due to this thread. Still need to test it.

However, somebody changed it according to documentation a while ago and we started having that issue again just recently, so need to retest that. Just throwing this idea out there, maybe it will help somebody. Will try to give an update after more testing.

In the mentions above, I noticed people talking about importance of awaiting pubsub calls. In our use case we return a response from a server after we call a function that triggers pubsub that is not awaited to pretty much avoid having pubsub fail before we get a chance to respond. However, we await withing the called function. Example

async function example () {
....
      initPubsubMessages(
      message1,
      message2,
      ...
      );
return data;
}

async function initPubsubMessages(message1, message2) {

  await Pusub.publish(topic, message1);
  await Pusub.publish(topic, message2);
}

We use firebase functions gen 2 which are pretty much google cloud functions at this point. To my knowledge. In my head everything here is correct and until the processes initiated in initPubsubMessages, which has been awaited, should be processed without a hitch. But I’m also aware the cloudrun has some quirks so I thought ill ask for somebody else’s opinion on that. Would firebase functions/cloudrun stop all processes as soon the server responds, is my question i guess? I am grasping at straws here but I’ve experienced weirder things using that platform.

I have solved this issue. The problem was with environmental variables. Google Cloud functions parse each variable added from the GUI console, and ads escape sequences. So \n in my service account’s private will be converted to \\n resulting in wrong secrets.

It’s strange that the Pub/Sub SDK does not return proper error messages; it always returns an API timeout, no matter how much time I increase the timeout.

Removing escape sequences from the account’s private key solves the problem.

Does anyone know why it would take over a minute to publish?

If the queue has a lot of data to process, it will take more time. You can reproduce the issue by load testing with multiple requests. The issue will be fixed by increasing the timeout value to a max value of 540000 i.e. 540 secs.

I got error “google.pubsub.v1.Publisher exceeded 60000 milliseconds before any response was received.”, By increasing gaxOpts timeout value to 100,000 solved our problem.

const publishOptions = {
  gaxOpts: {
    timeout: 100000,
  },
};

const topicChannel = pubsub.topic(TOPIC_ID, publishOptions);

Thanks @koushil-mankali for your solution. It’s working. I am facing a new issue related to google cloud logging. Can you please let me know how to fix the below error. It will be very helpful. Thanks

Error: Total timeout of API google.logging.v2.LoggingServiceV2 exceeded 60000 milliseconds before any response was received.

I fixed like this in version 3.2.0:

before:

const publisherClient = new PublisherClient({ projectId })
await publisherClient.publish({
  topic: `projects/${projectId}/topics/${topic}`,
  messages: [
    {
      data: JSON.stringify(body),
    },
  ],
});

after:

const publisherClient = new PubSub({ projectId })
await publisherClient
  .topic(`projects/${projectId}/topics/${topic}`)
  .publishMessage({
     data: Buffer.from(JSON.stringify(body)),
  });

I am on version 3.3.0 and I am having the same issue

I have tried both pubsub 2.17 and gax 2.28.1, but doesn’t seem to work for this simple use case-

exports.main = async (message, context) => {
  const data = {
    event: "AccountCreated"
  };
  const topicName = 'some valid topic name'
  const pubMessage = Buffer.from(JSON.stringify(data));
  const topic = pubsub.topic(topicName);

  topic.publish(pubMessage);
};

Keep getting the same error when the await on publishMessage hangs for a bit and crashes-

GoogleError: Total timeout of API google.pubsub.v1.Publisher exceeded 60000 milliseconds before any response was received.
    at repeat (/Users/user/path/app/server/node_modules/@google-cloud/pubsub/node_modules/google-gax/build/src/normalCalls/retries.js:66:31)
    at Timeout._onTimeout (/Users/user/path/app/server/node_modules/@google-cloud/pubsub/node_modules/google-gax/build/src/normalCalls/retries.js:101:25)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7) {
  code: 4
}

Are you calling it concurrently? I had a similar issue to this post when running om GCP and my issue was that i was sending lots of request concurrently without awaiting the promises, causing my service to scale down. Instead you can do multiple publishes in a single request and await the respones (if doing scale-to-zero)

@Sytten the endpoint will have quite a bit of traffic later on so it’s OK for us.

We have tried using { batching: { maxMessages: 1 } }) and it worked most of the time, but occasionally failed.

Same here, too. Running 2.18.4 with dockerized node application.

same here too, with 2.18.3 and 2.18.4 versions

Can confirm, we are facing the same issue on Cloud Functions with @google-cloud/pubsub version 2.18.4.

I don’t believe this is the issue in production. Most likely cloud run now kills the network access when the container is inactive (aka when the request has ended) which happens more often on low load. It used to work though, so I believe it might be a change in cloud run infrastructure. I am making sure that all the pubsub calls are awaited properly on my side. You should also set .topic(topic, { batching: { maxMessages: 1 } }) so that the message is sent right away instead of waiting in a queue.

FYI I experienced the same behavior running on Cloud Functions. Rolling back to version 2.12.0 unblocked me.