nodejs-spanner: Random timeout/deadline exceeded unhandled rejections
Environment details
- OS: Cloud Functions environment (Ubuntu 18.04)
- Node.js version: NodeJS 16
- npm version: Not sure, the one used by Cloud Functions
@google-cloud/spanner
version:6.1.2
, but the issue has been happening since versions5.x
Steps to reproduce
Run a Cloud Function using the Spanner client with an opened database, even without making requests.
Context
I’ve been using the NodeJS Spanner client for several projects for about a year now, and I’ve consistently noticed DEADLINE_EXCEEDED
and other related errors not directly linked to my requests, usually when the client hasn’t been used for several minutes.
What I mean by that is that those appear as unhandled rejections in NodeJS. Cloud Functions makes this more visible as unhandled rejections make the functions framework crash and log the error.
This issue occurs in many of my Cloud Functions performing various unrelated tasks using Spanner. Again, business code is never impacted as the unhandled rejections usually occur when the Cloud Function has been inactive for several minutes (or even several hours). This also means that this bug is much more visible when the minimum number of instances is set to >= 1
, as the Cloud Function instances stay idle longer (or even indefinitely).
I see no reason why this wouldn’t happen in other environments, but I’ve mainly been using Cloud Functions lately and can only testify for the behaviour in this context.
All Cloud Functions show fairly low activity (most of the time they are stuck at min_instances: 1
and sometimes reach 5 instances). The Cloud Spanner instance is running with 200
processing units and is underutilised (it never goes above 10% CPU).
The problem here is really about the rejections not being caught by the client. I’m not really worried about the errors themselves as they never seem to occur when running business operations/transactions (or at least they are being caught and properly retried in this case).
My only lead would be the keepAlive
mechanism as it uses setInterval
and periodically makes requests to Spanner, especially when sessions aren’t being used. However after looking at the code I haven’t seen anything obvious.
Stack traces
Unfortunately as those are unhandled rejections the stack traces are usually not very helpful, but one of them clearly references the Spanner client. To give you a rough idea of the rate of occurence of the issue, it has happened 33 times in 10 days for around 100 Cloud Functions, most of which have min_instances: 1
set. Out of the 33 errors, 4 are due to CANCELLED
, 3 are Total timeout of API...
, and the rest are DEADLINE_EXCEEDED
.
Unhandled rejection
Error: Total timeout of API google.spanner.v1.Spanner exceeded 60000 milliseconds before any response was received.
at repeat (/workspace/node_modules/google-gax/build/src/normalCalls/retries.js:66:31)
at Timeout._onTimeout (/workspace/node_modules/google-gax/build/src/normalCalls/retries.js:101:25)
at listOnTimeout (node:internal/timers:559:17)
at processTimers (node:internal/timers:502:7)
Error: Process exited with code 16
at process.<anonymous> (/workspace/node_modules/@google-cloud/functions-framework/build/src/invoker.js:92:22)
at process.emit (node:events:527:28)
at process.emit (node:domain:475:12)
at process.exit (node:internal/process/per_thread:189:15)
at sendCrashResponse (/workspace/node_modules/@google-cloud/functions-framework/build/src/logger.js:44:9)
at process.<anonymous> (/workspace/node_modules/@google-cloud/functions-framework/build/src/invoker.js:88:44)
at process.emit (node:events:527:28)
at process.emit (node:domain:475:12)
at emit (node:internal/process/promises:140:20)
at processPromiseRejections (node:internal/process/promises:274:27)
Unhandled rejection
Error: 4 DEADLINE_EXCEEDED: Deadline exceeded
at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:189:52)
at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:111:35
at Object.onReceiveStatus (/workspace/node_modules/grpc-gcp/build/src/index.js:73:29)
at InterceptingListenerImpl.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:106:23)
at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:365:141)
at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181)
at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:187:78
at processTicksAndRejections (node:internal/process/task_queues:78:11)
at runNextTicks (node:internal/process/task_queues:65:3)
// Same Error: Process exited with code 16 coming from the functions-framework.
Unhandled rejection
Error: 1 CANCELLED: Call cancelled
at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:189:52)
at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:111:35
at Object.onReceiveStatus (/workspace/node_modules/grpc-gcp/build/src/index.js:73:29)
at InterceptingListenerImpl.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:106:23)
at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:365:141)
at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181)
at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:187:78
at processTicksAndRejections (node:internal/process/task_queues:78:11)
// Same Error: Process exited with code 16 coming from the functions-framework.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 3
- Comments: 24 (9 by maintainers)
This makes sense. Give me some time to figure out which part of the code the error should get caught in and I will add code to catch this. I am going to be on vacation for a week but I will look into this when I get back.