data-api-client: Too many connections
Hello,
I get this error on about 1% of executions…
Looks like there are no more available connections, but isn’t aurora serverless supposed to autoscale automatically?
{
"errorType": "Runtime.UnhandledPromiseRejection",
"errorMessage": "BadRequestException: Too many connections",
"reason": {
"errorType": "BadRequestException",
"errorMessage": "Too many connections",
"code": "BadRequestException",
"message": "Too many connections",
"time": "2020-02-04T18:31:38.387Z",
"requestId": "c0cdad58-cefe-42a8-b3f6-acf1f4bffb07",
"statusCode": 400,
"retryable": false,
"retryDelay": 99.64592804784691,
"stack": [
"BadRequestException: Too many connections",
" at Object.extractError (/var/task/node_modules/aws-sdk/lib/protocol/json.js:51:27)",
" at Request.extractError (/var/task/node_modules/aws-sdk/lib/protocol/rest_json.js:55:8)",
" at Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:106:20)",
" at Request.emit (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:78:10)",
" at Request.emit (/var/task/node_modules/aws-sdk/lib/request.js:683:14)",
" at Request.transition (/var/task/node_modules/aws-sdk/lib/request.js:22:10)",
" at AcceptorStateMachine.runTo (/var/task/node_modules/aws-sdk/lib/state_machine.js:14:12)",
" at /var/task/node_modules/aws-sdk/lib/state_machine.js:26:10",
" at Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:38:9)",
" at Request.<anonymous> (/var/task/node_modules/aws-sdk/lib/request.js:685:12)"
]
},
"promise": {},
"stack": [
"Runtime.UnhandledPromiseRejection: BadRequestException: Too many connections",
" at process.<anonymous> (/var/runtime/index.js:35:15)",
" at process.emit (events.js:210:5)",
" at process.EventEmitter.emit (domain.js:476:20)",
" at processPromiseRejections (internal/process/promises.js:201:33)",
" at processTicksAndRejections (internal/process/task_queues.js:94:32)"
]
}
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 2
- Comments: 34 (5 by maintainers)
@nitesmeh Am I right to assume that the data-api should pool requests more efficiently than traditional connections? If not - is there any reason to use the data api if we do not need to access from outside a VPC?
PM for Data API here. Thanks for raising this @jeremydaly and others. We are looking into it.
@QAnders and anyone else that cares – I got a reply from AWS premium support and they have been able to repro the issue:
will update when I get a response!
We’ve always experienced Aurora Serverless like this, one of the reasons we stopped using it.
EDIT: I probably should be more in-depth rather than snarky, but the lag between hitting the connection limit and scaling to the next ACU level is substantial, your application needs to be capable of waiting until more connections are available. We had to add try/catch loops everywhere that would catch exactly this situation and just keep slamming the database until it decided to actually scale up (this can take up to 5-10 minutes for each level).
Also another thing you need to make sure you’re considering is the database automatically deciding to scale down on you even though you don’t want it to. You will suddenly get disconnected on half your jobs and they need to be able to recover from that.
Well, this is a farse from AWS… We found the same
commit
@deel77 some time ago (sorry for not thinking about reporting it here)…We were eagerly awaiting Serverless v2 (and was part of the beta) and I testing it thoroughly and it was scaling nicely even with a lot of open transactions. At launch (GA) we got the information that the Data-API won’t be added in v2…
We’ve move a lot of our backend workloads over to AppSync for the sole purpose of being able to query DB’s directly using VTL and the fact that AppSync is using the Data-API to “smooth out” the peaks.
It’s not working! Several times a week we hit “too many connections” and as @deel77 found, no go as 500 is the max and it’s not possible to raise it. None of these limitations were mentioned in “architectural meetings” with AWS experts prior to moving to the solution…
So, for us, the only viable solution is moving to Serverless v2 (or standard RDS) but as we have a very uneven load Serverless would make more sense. However, v2 is quite costly, and of course no Data-API… We’d have to rebuild all our AppSync API’s to use Lambdas, and then why use AppSync at all?
AWS really dropped the ball on this one!
Thanks @deel77 and @QAnders for the feedback. Deprecating the Data API would certainly be disappointing for us too. I’d love to get an official AWS view on it as it does strike me as a step backwards. Like yourselves, we were very excited about coupling AppSync with the Data API though now we may hold off moving in that direction. We might look at RDS Proxy though the appeal of Data API was its REST-based API.
If @nitesmeh or anyone from AWS has any updates that would be great. We’d love to use this pattern and Aurora Serverless generally but it doesn’t seem fit-for-purpose for our use cases at the moment.
@ortonomy We have that exact same issue after increasing our use of AppSync!
We have one express based server (Elastic Beanstalk) which is still using native DB connection to PG (the Aurora Serverless cluster) but it is limited to a max. of 10 connections. We have a bunch of Lambdas, all using the Data-API and now four AppSync API’s. Ever since we added the AppSync API’s we run into a lot more
Too many connections
issues… 😦I’ve had a few chats with AWS support and (they are helpful and try) but no real solution as of yet. They (kind of) agree with my assumption that a lot of updates/inserts prevents the cluster from scaling and only when the updates/inserts are all committed it scales (but then it’s too late). We’ve rebuilt some of the functions that hits the hardest to output queries to an SQS FIFO queue and then we go through that queue in a more controlled fashion…
This is very annoying and I hope that V2 of serverless fixes this (although our AWS contact and the support are very tight-lipped about it, even when it’s going to be released for Postgres…)
Here to ressurrect this issue – we’re regularly seeing this
Too many connections
issue which is causing outages for our app.After investigation I see that even under a small load, the number of open connections the Aurora Serverless cluster explodes, and then the conncetions are kept alive in
sleep
for long periods of time.What makes our setup different is we’re also using AppSync RDS resolvers for our GraphQL API separately from our lambdas which are handling webhooks and using the Data API.
@nitesmeh and anyone else – would it be possible that the Data API’s connection pooling mechanism is using up all our connections and not leaving enough for the AppSync RDS resolvers and this is what’s causing the issues… Any insight?
@jonathannen Thanks for the update on the transactions. I’m currently using TypeORM with the typeorm-aurora-data-api-driver I believe it’s creating a new transaction for every query it does. I haven’t tried yet but I was thinking about creating a single transaction for every lambda invocation. Do you know if there are any limits on the number of queries you can have in a transaction? Or do you see any other pitfalls with that approach?
@QAnders @jeremydaly as far as my experience indicates – each concurrent transaction via the Data API appears to need it’s own connection to the database. If you have a workload that uses a lot of concurrent transactions, it’ll easily blow through the number of connections available.
If you can eliminate, consolidate, or reduce the number of transactions - that appears to help immensely.
@QAnders, I’m glad you like it. The DATA API does consume connections, but is supposed to act a bit like RDS Proxy in that you shouldn’t have to worry about it since it’s using a connection pool on the backend. There seems to be a lot of people who have had this scaling issue, though.
@kelbyenevoldLA Thanks for your suggestions…
We haven’t been able to find any underlying problem with our queries, obviously, it doesn’t mean there isn’t one.
We’re thinking about switching to serverless-mysql to make sure the issue is the Data API…