serverless-mysql: Cannot enqueue query error after Database failover (AWS RDS MySQL Aurora)
We migrated majority of our Lambda functions to use this library in our dev environment to give it a run and so far it has been working great. However, yesterday we faced an issue, which still continues to be an issue and I have no idea how to resolve this.
Approximately at noon on Sunday our dev RDS MySQL (Aurora provisioned instance) crashed, and then eventually failed over to the failover instance. The reason of the crash is still unknown but we are investigating, however it is worth mentioning that this is the first time the instance ever crashed in the past 1.5 years of continuous reliable usage. Once the failover was complete, we were able to connect to the DB and query normally, however, all the Lambda functions were now throwing this error (see below). This error, specifically started after the failover and also ALL Lambda functions that query our database seem to be facing this exact error as it reflected in Cloudwatch logs. We stopped and started the cluster, rebooted the cluster even hoping to resolve this issue but the error kept showing up and the API Gateway endpoints continued to return error. We looked in the database and there were no lingering sessions/processes to be found.
2018-09-30T22:25:42.908Z c49a6ffc-c4ff-11e8-9665-c71a7d30630c Error: { Error: Cannot enqueue Query after fatal error.
at Protocol._validateEnqueue (/var/task/node_modules/mysql/lib/protocol/Protocol.js:200:16)
at Protocol._enqueue (/var/task/node_modules/mysql/lib/protocol/Protocol.js:138:13)
at Connection.query (/var/task/node_modules/mysql/lib/Connection.js:200:25)
at Promise (/var/task/node_modules/serverless-mysql/index.js:185:23)
at new Promise (<anonymous>)
at Object.query (/var/task/node_modules/serverless-mysql/index.js:183:10)
at <anonymous> code: 'PROTOCOL_ENQUEUE_AFTER_FATAL_ERROR', fatal: false }
We ultimately switched over a couple of Lambda’s to our native code which didn’t use this library, but was using traditional MySQL js connection pool and handler callback, instead. Curiously enough, those functions happened to be working just fine (even with concurrent requests). Since then, we reverted back all of the Lambda functions to our previous version since none of our Lambda functions were operational at the time due to the error above, but I thought it would be worth raising this issue since I am not entirely certain what’s causing it.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 22 (8 by maintainers)
Commits related to this issue
- close #7 by catching enqueue errors and resetting connection — committed to jeremydaly/serverless-mysql by jeremydaly 6 years ago
Looks like we don’t have to wait until re:Invent: https://aws.amazon.com/about-aws/whats-new/2018/11/aurora-serverless-data-api-beta/
@annjawn The purpose of serverless-mysql is to maintain a ratio of persistent connections so that your serverless functions will smartly reuse them when necessary, and clean them up when server resources are running low.
await mysql.end()
is what allows serverless-mysql to run the process to clean up those connections. If you don’tawait
that call, then the connection is frozen before it can do its job.If you want to minimize the number of sleeping connections, set the
connUtilization
configuration option to something lower than the default of 0.8 (which will keep 80% of your maxConnections open). Set it to something like 0.5, or you can set a max user connections on your server and connect to MySQL with that.I can confirm that it worked too. Thanks a lot.
Great, I tested it and it seems to fix the issue 😉 I didn’t get the error since I updated to last version.
Thanks for the fix @jeremydaly !
@SebSchwartz If you could add some additional context to this, that would be great. I’m also waiting to see what happens at re:Invent. If we get RDS HTTP endpoints, then this library may become moot.
Alright, I am back to using
serverless-mysql
withawait mysql.end()
although I am not sure if i’d be able to re-create the issue again. I will report back if this happens but thanks for the help!